Re: TikaEntityProcessor not working?

2011-11-21 Thread kumar8anuj
Erick, 
  Need your help on this. Waiting for resolution. Please help ... 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TikaEntityProcessor not working?

2011-11-21 Thread Erick Erickson
Sorry, but I don't really have that info.

Erick

On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj kumar.an...@gmail.com wrote:
 Erick,
          Need your help on this. Waiting for resolution. Please help ...

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: TikaEntityProcessor not working?

2011-11-21 Thread kumar8anuj
So where can i get some information on this issue, Can you please help ?

On Mon, Nov 21, 2011 at 8:17 PM, Erick Erickson [via Lucene] 
ml-node+s472066n3524905...@n3.nabble.com wrote:

 Sorry, but I don't really have that info.

 Erick

 On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3524905i=0
 wrote:
  Erick,
   Need your help on this. Waiting for resolution. Please help ...
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html

  Sent from the Solr - User mailing list archive at Nabble.com.
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524905.html
  To unsubscribe from TikaEntityProcessor not working?, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=856965code=a3VtYXIuYW51ajhAZ21haWwuY29tfDg1Njk2NXwtMzA0MTQ2MTI5
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
“The more you are willing to accept responsibility for your actions, the
more credibility you will have”
Anuj Kumar
Ph. No.-09873721510


--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524975.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

2011-11-21 Thread Gora Mohanty
On Mon, Nov 21, 2011 at 8:45 PM, kumar8anuj kumar.an...@gmail.com wrote:
 So where can i get some information on this issue, Can you please help ?

Have you tried simple things like searching Google, using the Tika
site, and, failing these, asking on a Tika-specific mailing list? No
offence, but you might do some basic homework yourself.
* Tika: Not sure how well supported 0.6 is nowadays, but
   http://tika.apache.org/0.6/gettingstarted.html seems to indicate
   that version 3.6 of poi is needed. You should also consider
   switching to a newer version of Tika.
* If that does not work, please try joining a Tika mailing list, and
  asking a more specific question there:
  http://tika.apache.org/mail-lists.html

Regards,
Gora


Re: TikaEntityProcessor not working?

2011-11-21 Thread kumar8anuj
Thanks for the reply Gora, I  tried Googling but didn't find anything on
this. I didn't try this on Tika mailing list ,I will post this to tika
mailing list now. Thanks for the suggestion



On Mon, Nov 21, 2011 at 9:10 PM, Gora Mohanty-3 [via Lucene] 
ml-node+s472066n3525046...@n3.nabble.com wrote:

 On Mon, Nov 21, 2011 at 8:45 PM, kumar8anuj [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3525046i=0
 wrote:
  So where can i get some information on this issue, Can you please help ?

 Have you tried simple things like searching Google, using the Tika
 site, and, failing these, asking on a Tika-specific mailing list? No
 offence, but you might do some basic homework yourself.
 * Tika: Not sure how well supported 0.6 is nowadays, but
http://tika.apache.org/0.6/gettingstarted.html seems to indicate
that version 3.6 of poi is needed. You should also consider
switching to a newer version of Tika.
 * If that does not work, please try joining a Tika mailing list, and
   asking a more specific question there:
   http://tika.apache.org/mail-lists.html

 Regards,
 Gora


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3525046.html
  To unsubscribe from TikaEntityProcessor not working?, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=856965code=a3VtYXIuYW51ajhAZ21haWwuY29tfDg1Njk2NXwtMzA0MTQ2MTI5
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
“The more you are willing to accept responsibility for your actions, the
more credibility you will have”
Anuj Kumar
Ph. No.-09873721510


--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3526896.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

2011-11-14 Thread kumar8anuj
Earlier issue has been resolved but stuck up on something else. Can you tell
me which poi jar version would work with tika.0.6. Currently I have 
poi-3.7.jar. Error which i am getting is this 

SEVERE: Exception while processing: js_logins document :
SolrInputDocument[{id=id(1.0)={100984},
complete_mobile_number=complete_mobile_number(1.0)={+91 9600067575},
emailid=emailid(1.0)={vkry...@gmail.com}, full_name=full_name(1.0)={Venkat
Ryali}}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:163)
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:161)
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTableContent(XWPFWordExtractorDecorator.java:140)
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:91)
at
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:69)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:51)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
... 7 more


--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3506596.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TikaEntityProcessor not working?

2011-11-08 Thread kumar8anuj
Erick, As Brad has configured the system, I configured it in the same way and
then no document indexing was happening and i was not even getting any
errors in the log. I then changed my Tika to 0.6 and tried it but no
success. So table columns are getting indexed but document is not. Let me
know if i m not clear to you.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3490077.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TikaEntityProcessor not working?

2011-11-08 Thread Erick Erickson
What's not clear is what you are doing to insure that the file names pulled
from your database are being read (from disk? from a shared filesystem
somewhere?), analyzed and sent to Solr.

So, somewhere you need to actually use the file name to pass on to
one of the processors that'll actually send the *contents* of that file
to Solr along with the columns.

For instance, you haven't included your DIH configuration, we can't
tell if you're trying to do anything like that here.

Best
Erick

On Tue, Nov 8, 2011 at 7:15 AM, kumar8anuj kumar.an...@gmail.com wrote:
 Erick, As Brad has configured the system, I configured it in the same way and
 then no document indexing was happening and i was not even getting any
 errors in the log. I then changed my Tika to 0.6 and tried it but no
 success. So table columns are getting indexed but document is not. Let me
 know if i m not clear to you.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3490077.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: TikaEntityProcessor not working?

2011-11-07 Thread kumar8anuj
I tried to do the same but problem still persist and my document is not
getting indexed. I am using solr 3.4.0 and it was having tika 0.8 i replaced
core and parser jar with the 0.6 but document is not getting indexed. Please
help and nothing is coming in my logs related to that.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3486898.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TikaEntityProcessor not working?

2011-11-07 Thread Erick Erickson
You have to provide a lot more information about what you're doing. Are
you trying to use DIH? the extracting update request handler? What
do your config files look like?

Please review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, Nov 7, 2011 at 8:18 AM, kumar8anuj kumar.an...@gmail.com wrote:
 I tried to do the same but problem still persist and my document is not
 getting indexed. I am using solr 3.4.0 and it was having tika 0.8 i replaced
 core and parser jar with the 0.6 but document is not getting indexed. Please
 help and nothing is coming in my logs related to that.


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3486898.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: TikaEntityProcessor not working?

2010-06-04 Thread Brad Greenlee
You are my hero. I replaced the Tika 0.8 snapshots that were included with Solr 
with 0.6 and it works now. Thank you!

Brad

On Jun 3, 2010, at 6:22 AM, David George wrote:

 
 Which version of Tika do you have? There was a problem introduced somewhere
 between Tika 0.6 and Tika 0.7 whereby the TikaConfig method
 config.getParsers() was returns an empty parser list due to class loader
 scope issues with Solr running under an application server.
 
 There is a fix in the Tika 0.8 branch and I note that a 0.8 snapshot of Tika
 is including in the Solr trunk. I've not tried to get this to work and am
 not sure what config is needed to make this work. I simply installed Tika
 0.6 which can be dowloaded from the apache tika website.
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p867572.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: TikaEntityProcessor not working?

2010-06-03 Thread David George

Which version of Tika do you have? There was a problem introduced somewhere
between Tika 0.6 and Tika 0.7 whereby the TikaConfig method
config.getParsers() was returns an empty parser list due to class loader
scope issues with Solr running under an application server.

There is a fix in the Tika 0.8 branch and I note that a 0.8 snapshot of Tika
is including in the Solr trunk. I've not tried to get this to work and am
not sure what config is needed to make this work. I simply installed Tika
0.6 which can be dowloaded from the apache tika website.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p867572.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TikaEntityProcessor not working?

2010-05-31 Thread Brad Greenlee

It is a file. Only the filename is stored in the database.

Brad


On May 31, 2010, at 2:59 AM, Noble Paul നോബിള്‍  नो 
ब्ळ् noble.p...@corp.aol.com wrote:



BinFileDataSource  will only work with file, Try FieldStreamDataSource

On Mon, May 31, 2010 at 3:30 AM, Brad Greenlee b...@footle.org  
wrote:


Hi. I'm trying to get Solr to index a database in which one column  
is a
filename of a PDF document I'd like to index. My configuration  
looks like

this:

dataConfig
dataSource name=ds-db driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/document_db user=user  
password=password

readOnly=true/
dataSource name=ds-file type=BinFileDataSource/
document name=documents
  entity name=document dataSource=ds-db query=select * from
documents
entity processor=TikaEntityProcessor
url=/some/path/${document.filename} dataSource=ds-file  
format=text

  field column=text /
/entity
  /entity
/document
/dataConfig

I'm using Solr from trunk (as of two days ago). The import process
completes without errors, and it picks up the columns from the  
database, but
not the content from the PDF file. It is definitely trying to  
access the PDF
file, for if I give it an incorrect path name, it complains. It  
doesn't seem
to be attempting to index the PDF, though, as it completes in about  
40ms,
whereas if I import the PDF via the ExtractingRequestHandler, it  
takes about

11 seconds to index it.

I've also tried the tika example in example-DIH and that doesn't  
seem to
index anything, either. Am I doing something wrong, or is this just  
not

working yet?

Cheers,

Brad





--
-
Noble Paul | Systems Architect| AOL | http://aol.com