On Fri, Nov 20, 2009 at 9:13 PM, javaxmlsoapdev wrote:
>
> did you extend DIH to do this work? can you share code samples. I have
> similar requirement where I need tp index database records and each record
> has a column with document path so need to create another index for
> documents (we allo
sage in context:
http://old.nabble.com/How-to-use-DataImportHandler-with-ExtractingRequestHandler--tp25267745p26485245.html
Sent from the Solr - User mailing list archive at Nabble.com.
owever, one of the column store the location of PDF file. How can I
>> configure DataImportHandler to use ExtractingRequestHandler to extract
>> the
>> content of the PDF?
>>
>> Thanks!
>>
>> Khai Doan
>>
>
>
>
--
View this message in context:
http://old.nabble.com/How-to-use-DataImportHandler-with-ExtractingRequestHandler--tp25267745p26443544.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Khai,
a few weeks ago, I was facing the same problem.
In my case, this workaround helped (assuming, you're using Solr 1.3):
For each row, extract the content from the corresponding pdf file using
a parser library of your choice (I suggest Apache PDFBox or Apache Tika
in case you need to pr
unfortunately DIH is not yet integrated with ExtractingRequestHandler .
see this https://issues.apache.org/jira/browse/SOLR-1358
On Thu, Sep 3, 2009 at 5:34 AM, Khai Doan wrote:
> Hi all,
>
> My name is Khai. I have a table in a relational database. I have
> successfully use DataImportHandler
Hi all,
My name is Khai. I have a table in a relational database. I have
successfully use DataImportHandler to import this data into Apache Solr.
However, one of the column store the location of PDF file. How can I
configure DataImportHandler to use ExtractingRequestHandler to extract the
conte