DIH should be able read data directly from HDFS for indexing ------------------------------------------------------------
Key: SOLR-2096 URL: https://issues.apache.org/jira/browse/SOLR-2096 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Affects Versions: 1.4.1 Reporter: Amit Nithian Fix For: 1.4.2 Attachments: hdfs_reader.tar DIH doesn't support reading from the hdfs:// protocol which makes it hard to index data generated by a M/R job. This tarball contains a subclass of the URLDataSource along with an HDFSReader that allows for this. The data is assumed to be in text format and able to be processed by the LineEntityProcessor. Here is an example DIH-Config snippet: <dataSource name="queryData" type="org.apache.solr.handler.dataimport.hdfs.HDFSDataSource" baseUrl="hdfs://<YOURSERVER>:9000/" encoding="UTF-8" connectionTimeout="5000" readTimeout="10000"/> <document name="autoSuggester"> <entity name="jc" processor="LineEntityProcessor" url="<YOUR FOLDER>/part*" dataSource="queryData"> <!-- Field mappings here if necessary --> </entity> </document> </dataConfig> -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org