Re: TikaEntityProcessor not working?
Erick, Need your help on this. Waiting for resolution. Please help ... -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
Sorry, but I don't really have that info. Erick On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj kumar.an...@gmail.com wrote: Erick, Need your help on this. Waiting for resolution. Please help ... -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
So where can i get some information on this issue, Can you please help ? On Mon, Nov 21, 2011 at 8:17 PM, Erick Erickson [via Lucene] ml-node+s472066n3524905...@n3.nabble.com wrote: Sorry, but I don't really have that info. Erick On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj [hidden email]http://user/SendEmail.jtp?type=nodenode=3524905i=0 wrote: Erick, Need your help on this. Waiting for resolution. Please help ... -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524905.html To unsubscribe from TikaEntityProcessor not working?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=856965code=a3VtYXIuYW51ajhAZ21haWwuY29tfDg1Njk2NXwtMzA0MTQ2MTI5 . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- “The more you are willing to accept responsibility for your actions, the more credibility you will have” Anuj Kumar Ph. No.-09873721510 -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524975.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
On Mon, Nov 21, 2011 at 8:45 PM, kumar8anuj kumar.an...@gmail.com wrote: So where can i get some information on this issue, Can you please help ? Have you tried simple things like searching Google, using the Tika site, and, failing these, asking on a Tika-specific mailing list? No offence, but you might do some basic homework yourself. * Tika: Not sure how well supported 0.6 is nowadays, but http://tika.apache.org/0.6/gettingstarted.html seems to indicate that version 3.6 of poi is needed. You should also consider switching to a newer version of Tika. * If that does not work, please try joining a Tika mailing list, and asking a more specific question there: http://tika.apache.org/mail-lists.html Regards, Gora
Re: TikaEntityProcessor not working?
Thanks for the reply Gora, I tried Googling but didn't find anything on this. I didn't try this on Tika mailing list ,I will post this to tika mailing list now. Thanks for the suggestion On Mon, Nov 21, 2011 at 9:10 PM, Gora Mohanty-3 [via Lucene] ml-node+s472066n3525046...@n3.nabble.com wrote: On Mon, Nov 21, 2011 at 8:45 PM, kumar8anuj [hidden email]http://user/SendEmail.jtp?type=nodenode=3525046i=0 wrote: So where can i get some information on this issue, Can you please help ? Have you tried simple things like searching Google, using the Tika site, and, failing these, asking on a Tika-specific mailing list? No offence, but you might do some basic homework yourself. * Tika: Not sure how well supported 0.6 is nowadays, but http://tika.apache.org/0.6/gettingstarted.html seems to indicate that version 3.6 of poi is needed. You should also consider switching to a newer version of Tika. * If that does not work, please try joining a Tika mailing list, and asking a more specific question there: http://tika.apache.org/mail-lists.html Regards, Gora -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3525046.html To unsubscribe from TikaEntityProcessor not working?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=856965code=a3VtYXIuYW51ajhAZ21haWwuY29tfDg1Njk2NXwtMzA0MTQ2MTI5 . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- “The more you are willing to accept responsibility for your actions, the more credibility you will have” Anuj Kumar Ph. No.-09873721510 -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3526896.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
Earlier issue has been resolved but stuck up on something else. Can you tell me which poi jar version would work with tika.0.6. Currently I have poi-3.7.jar. Error which i am getting is this SEVERE: Exception while processing: js_logins document : SolrInputDocument[{id=id(1.0)={100984}, complete_mobile_number=complete_mobile_number(1.0)={+91 9600067575}, emailid=emailid(1.0)={vkry...@gmail.com}, full_name=full_name(1.0)={Venkat Ryali}}]:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodError: org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) Caused by: java.lang.NoSuchMethodError: org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:163) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:161) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTableContent(XWPFWordExtractorDecorator.java:140) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:91) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:69) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:51) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596) ... 7 more -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3506596.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
Erick, As Brad has configured the system, I configured it in the same way and then no document indexing was happening and i was not even getting any errors in the log. I then changed my Tika to 0.6 and tried it but no success. So table columns are getting indexed but document is not. Let me know if i m not clear to you. -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3490077.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
What's not clear is what you are doing to insure that the file names pulled from your database are being read (from disk? from a shared filesystem somewhere?), analyzed and sent to Solr. So, somewhere you need to actually use the file name to pass on to one of the processors that'll actually send the *contents* of that file to Solr along with the columns. For instance, you haven't included your DIH configuration, we can't tell if you're trying to do anything like that here. Best Erick On Tue, Nov 8, 2011 at 7:15 AM, kumar8anuj kumar.an...@gmail.com wrote: Erick, As Brad has configured the system, I configured it in the same way and then no document indexing was happening and i was not even getting any errors in the log. I then changed my Tika to 0.6 and tried it but no success. So table columns are getting indexed but document is not. Let me know if i m not clear to you. -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3490077.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
I tried to do the same but problem still persist and my document is not getting indexed. I am using solr 3.4.0 and it was having tika 0.8 i replaced core and parser jar with the 0.6 but document is not getting indexed. Please help and nothing is coming in my logs related to that. -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3486898.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
You have to provide a lot more information about what you're doing. Are you trying to use DIH? the extracting update request handler? What do your config files look like? Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Nov 7, 2011 at 8:18 AM, kumar8anuj kumar.an...@gmail.com wrote: I tried to do the same but problem still persist and my document is not getting indexed. I am using solr 3.4.0 and it was having tika 0.8 i replaced core and parser jar with the 0.6 but document is not getting indexed. Please help and nothing is coming in my logs related to that. -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3486898.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
You are my hero. I replaced the Tika 0.8 snapshots that were included with Solr with 0.6 and it works now. Thank you! Brad On Jun 3, 2010, at 6:22 AM, David George wrote: Which version of Tika do you have? There was a problem introduced somewhere between Tika 0.6 and Tika 0.7 whereby the TikaConfig method config.getParsers() was returns an empty parser list due to class loader scope issues with Solr running under an application server. There is a fix in the Tika 0.8 branch and I note that a 0.8 snapshot of Tika is including in the Solr trunk. I've not tried to get this to work and am not sure what config is needed to make this work. I simply installed Tika 0.6 which can be dowloaded from the apache tika website. -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p867572.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
Which version of Tika do you have? There was a problem introduced somewhere between Tika 0.6 and Tika 0.7 whereby the TikaConfig method config.getParsers() was returns an empty parser list due to class loader scope issues with Solr running under an application server. There is a fix in the Tika 0.8 branch and I note that a 0.8 snapshot of Tika is including in the Solr trunk. I've not tried to get this to work and am not sure what config is needed to make this work. I simply installed Tika 0.6 which can be dowloaded from the apache tika website. -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p867572.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor not working?
It is a file. Only the filename is stored in the database. Brad On May 31, 2010, at 2:59 AM, Noble Paul നോബിള് नो ब्ळ् noble.p...@corp.aol.com wrote: BinFileDataSource will only work with file, Try FieldStreamDataSource On Mon, May 31, 2010 at 3:30 AM, Brad Greenlee b...@footle.org wrote: Hi. I'm trying to get Solr to index a database in which one column is a filename of a PDF document I'd like to index. My configuration looks like this: dataConfig dataSource name=ds-db driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/document_db user=user password=password readOnly=true/ dataSource name=ds-file type=BinFileDataSource/ document name=documents entity name=document dataSource=ds-db query=select * from documents entity processor=TikaEntityProcessor url=/some/path/${document.filename} dataSource=ds-file format=text field column=text / /entity /entity /document /dataConfig I'm using Solr from trunk (as of two days ago). The import process completes without errors, and it picks up the columns from the database, but not the content from the PDF file. It is definitely trying to access the PDF file, for if I give it an incorrect path name, it complains. It doesn't seem to be attempting to index the PDF, though, as it completes in about 40ms, whereas if I import the PDF via the ExtractingRequestHandler, it takes about 11 seconds to index it. I've also tried the tika example in example-DIH and that doesn't seem to index anything, either. Am I doing something wrong, or is this just not working yet? Cheers, Brad -- - Noble Paul | Systems Architect| AOL | http://aol.com