Re: upgrading to Tika 0.9 on Solr 1.4.1

2012-02-24 Thread bing
Hi, all, 

I tried to upgrade tika0.8 to tika0.10 on solr3.3.0, following the similar
steps, but failed. 

1. Replace the following jars in /contrib/extraction/ 
fontbox-1.6.0, jempbox-1.6.0, pdfbox-1.6.0, tika-core-0.10,
tika-parsers-0.10;

2. Copy all the jars in /contrib/langid/* from solr3.5.0 

3. Copy /dist/apache-solr-langid-3.5.0 from solr3.5.0

4. Configure solrconfig.xml in solr3.3.0, adding the following lib and
definition of updateRequestProcessorChain.

  lib dir=../../contrib/langid/lib /
  lib dir=../../dist/ regex=apache-solr-langid-\d.*\.jar /

  updateRequestProcessorChain name=langid
  
   processor
class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
 
 str name=langid.fltext,title,author/str
 str name=langid.langFieldlanguage_s/str
 str name=langid.fallbacken/str
   /processor
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain


Errors:  (typical errors when factory is not found)

org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389)
at 

Anyone tried similar things before. Pls advice. Thank you. 

Best Regards, 
Bing 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-tp2570526p3772177.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-07-06 Thread Surendra
I have upgraded my Solr Distribution to 3.2 and also the referring jars of my
application (especially the solr jar was 1.4.1 in my application which calls
solr...hence causing javabin exception...) . Also updated the
pdfbox/jempbox/fontbox to latest versions and Tika to 0.9 version...which made
things up for me!

-- Surendranadh




Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-22 Thread Surendra
Hi Chris ,Andreas

I have upgraded to solr 3.2 ... everything seems fine now. I will have to
integrate this to my application and observe if any further issues...again
thanks for your patience and time...

--Surendra




Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-22 Thread Mattmann, Chris A (388J)
Glad it worked out!

Cheers,
Chris

On Jun 22, 2011, at 5:14 AM, Surendra wrote:

 Hi Chris ,Andreas
 
 I have upgraded to solr 3.2 ... everything seems fine now. I will have to
 integrate this to my application and observe if any further issues...again
 thanks for your patience and time...
 
 --Surendra
 
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-21 Thread Surendra
Hi Chris

I did a proper checkout of TIKA 0.9 and built the jars as specified in the
http://tika.apache.org/0.9/gettingstarted.html; and replaced the existing
tika0.4 jars with 0.9 jars. I don't see any difference. The documents are
getting indexed but the fmap.content(attr_content) is still not available for
me. Am I missing something? Between I'm digging further in this isse... if I can
get any further help it would be great! Thanks for your time...

-- Surendra




Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-21 Thread Surendra
Hi Andreas
I tried solr 3.1 as well as 3.2... i was not able to overcome these issues with
the newer versions too. For me, I need the attr_content:* should return me
results (with 1.4.1 this is successful) which is not happening . It indexes well
in 3.1 but in 3.2 i have the following issue.
Invalid version or the data in not in 'javabin' format
--Surendra





Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-21 Thread Mattmann, Chris A (388J)
Hi Surendra,

Thanks. Besides replacing the tika-*-0.9.jar files, you also need to replace 
the dependency jar files for the other libs as well since they have been 
upgraded. It's also possible that b/c of API changes, Solr 1.4.1 won't work 
with Tika 0.9 without modifying the ExtractingRequestHandler  code...

Cheers,
Chris

On Jun 21, 2011, at 12:28 AM, Surendra wrote:

 Hi Chris
 
 I did a proper checkout of TIKA 0.9 and built the jars as specified in the
 http://tika.apache.org/0.9/gettingstarted.html; and replaced the existing
 tika0.4 jars with 0.9 jars. I don't see any difference. The documents are
 getting indexed but the fmap.content(attr_content) is still not available for
 me. Am I missing something? Between I'm digging further in this isse... if I 
 can
 get any further help it would be great! Thanks for your time...
 
 -- Surendra
 
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-21 Thread Andreas Kemkes
We are successfully extracting PDF content with Solr 3.1 and Tika 0.9.

Replace
fontbox-1.3.1.jar jempbox-1.3.1.jar pdfbox-1.3.1.jar tika-core-0.8.jar 
tika-parsers-0.8.jar 

with
 
fontbox-1.4.0.jar jempbox-1.4.0.jar pdfbox-1.4.0.jar tika-core-0.9.jar 
tika-parsers-0.9.jar 

I'm not entirely certain, if a recompile of Solr was necessary or not.
Andreas




From: Surendra csnsha...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, June 21, 2011 5:18:31 AM
Subject: Re: upgrading to Tika 0.9 on Solr 1.4.1

Hi Andreas
I tried solr 3.1 as well as 3.2... i was not able to overcome these issues with
the newer versions too. For me, I need the attr_content:* should return me
results (with 1.4.1 this is successful) which is not happening . It indexes well
in 3.1 but in 3.2 i have the following issue.
Invalid version or the data in not in 'javabin' format
--Surendra

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-20 Thread Surendra

Mattmann, Chris A (388J chris.a.mattmann at jpl.nasa.gov writes:

 
 Hi Jo,
 
 You may consider checking out Tika trunk, where we recently have a Tika JAX-RS
web service [1] committed as
 part of the tika-server module. You could probably wire DIH into it and
accomplish the same thing.
 
 Cheers,
 Chris
 
 [1] https://issues.apache.org/jira/browse/TIKA-593
 
 On Feb 24, 2011, at 12:42 PM, jo wrote:
 
  
  I have tried the steps indicated here:
  http://wiki.apache.org/solr/ExtractingRequestHandler
  http://wiki.apache.org/solr/ExtractingRequestHandler 
  
  and when I try to parse a document nothing would happen, no error.. I have
  copied the jar files everywhere, and nothing.. can anyone give me the steps
  on how to upgrade just tika, btw, currently on 1.4.1 has tika 0.4
  
  thank you
  
  
  -- 
  View this message in context:
http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-tp2570526p2570526.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.a.mattmann at nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++
Hey Chris

I have added tika-core 0.9 and tika-parsers 0.9 to Solr1.4.1 (extraction/lib)
after building them using the source provided by TIKA. Now I have an issue with
this. I am working with extracting PDF content using Solr. I have added
fmap.content to the configurable params as attr_content where I can see the
entire extracted document. After the TIKA update i am not able to see
attr_content appearing in the search results. When I restore it with old 0.4
TIKA jars again the attr_content appears. I didn't find any exceptions shown up
there in the console. Is this a known behavior that someone have faced already?
Can you guide me to resolve this?

-- Surendra







Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-20 Thread Mattmann, Chris A (388J)
Hi Surendra,

On Jun 20, 2011, at 4:59 AM, Surendra wrote:

 Hey Chris
 
 I have added tika-core 0.9 and tika-parsers 0.9 to Solr1.4.1 (extraction/lib)
 after building them using the source provided by TIKA. Now I have an issue 
 with
 this. I am working with extracting PDF content using Solr. I have added
 fmap.content to the configurable params as attr_content where I can see the
 entire extracted document. After the TIKA update i am not able to see
 attr_content appearing in the search results. When I restore it with old 0.4
 TIKA jars again the attr_content appears. I didn't find any exceptions shown 
 up
 there in the console. Is this a known behavior that someone have faced 
 already?
 Can you guide me to resolve this?

I don't think you can simple add a new tika-core-0.9 and tika-parsers-0.9 to 
extraction/lib -- I think you'll need to replace the set of prior Tika jars in 
there. Have a look here to see what jars you would need to replace, HTH:

http://tika.apache.org/0.9/gettingstarted.html

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-20 Thread Andreas Kemkes
I've unsuccessfully attempted to go down this road - there are API changes, 
some 
of which I was able to solve by taking code snippets from Solr 3.1.  Some 
 extraction-related tests for wouldn't pass (look for 'Solr 1.4.1 and Tika 0.9 
- 
some tests not passing' in the archive).  Ultimately, I decided that the then 
newly released Solr 3.1 was the less rocky route.  Not sure if that is an 
option 
for you.

Andreas




From: Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Mon, June 20, 2011 7:18:34 AM
Subject: Re: upgrading to Tika 0.9 on Solr 1.4.1

Hi Surendra,

On Jun 20, 2011, at 4:59 AM, Surendra wrote:

 Hey Chris
 
 I have added tika-core 0.9 and tika-parsers 0.9 to Solr1.4.1 (extraction/lib)
 after building them using the source provided by TIKA. Now I have an issue 
with
 this. I am working with extracting PDF content using Solr. I have added
 fmap.content to the configurable params as attr_content where I can see the
 entire extracted document. After the TIKA update i am not able to see
 attr_content appearing in the search results. When I restore it with old 0.4
 TIKA jars again the attr_content appears. I didn't find any exceptions shown 
up
 there in the console. Is this a known behavior that someone have faced 
already?
 Can you guide me to resolve this?

I don't think you can simple add a new tika-core-0.9 and tika-parsers-0.9 to 
extraction/lib -- I think you'll need to replace the set of prior Tika jars in 
there. Have a look here to see what jars you would need to replace, HTH:

http://tika.apache.org/0.9/gettingstarted.html

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Jan Høydahl
Your best bet is perhaps upgrading to latest 1.4 branch, i.e. 1.4.2-dev 
(http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/)
It includes Tika 0.8-SNAPSHOT and is a compatible drop-in (war/jar) replacement 
with lots of other bug fixes you'd also like (check changes.txt).

svn co http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4
cd branch-1.4
ant dist

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 24. feb. 2011, at 21.42, jo wrote:

 
 I have tried the steps indicated here:
 http://wiki.apache.org/solr/ExtractingRequestHandler
 http://wiki.apache.org/solr/ExtractingRequestHandler 
 
 and when I try to parse a document nothing would happen, no error.. I have
 copied the jar files everywhere, and nothing.. can anyone give me the steps
 on how to upgrade just tika, btw, currently on 1.4.1 has tika 0.4
 
 thank you
 
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-tp2570526p2570526.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Markus Jelsma
You don't want to use 0.8 if you're parsing PDF.

 Your best bet is perhaps upgrading to latest 1.4 branch, i.e. 1.4.2-dev
 (http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/) It
 includes Tika 0.8-SNAPSHOT and is a compatible drop-in (war/jar)
 replacement with lots of other bug fixes you'd also like (check
 changes.txt).
 
 svn co http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4
 cd branch-1.4
 ant dist
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 On 24. feb. 2011, at 21.42, jo wrote:
  I have tried the steps indicated here:
  http://wiki.apache.org/solr/ExtractingRequestHandler
  http://wiki.apache.org/solr/ExtractingRequestHandler
  
  and when I try to parse a document nothing would happen, no error.. I
  have copied the jar files everywhere, and nothing.. can anyone give me
  the steps on how to upgrade just tika, btw, currently on 1.4.1 has tika
  0.4
  
  thank you


Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Mattmann, Chris A (388J)
Hi Jo,

You may consider checking out Tika trunk, where we recently have a Tika JAX-RS 
web service [1] committed as part of the tika-server module. You could probably 
wire DIH into it and accomplish the same thing.

Cheers,
Chris

[1] https://issues.apache.org/jira/browse/TIKA-593

On Feb 24, 2011, at 12:42 PM, jo wrote:

 
 I have tried the steps indicated here:
 http://wiki.apache.org/solr/ExtractingRequestHandler
 http://wiki.apache.org/solr/ExtractingRequestHandler 
 
 and when I try to parse a document nothing would happen, no error.. I have
 copied the jar files everywhere, and nothing.. can anyone give me the steps
 on how to upgrade just tika, btw, currently on 1.4.1 has tika 0.4
 
 thank you
 
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-tp2570526p2570526.html
 Sent from the Solr - User mailing list archive at Nabble.com.


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread jo

You guys are great.. I will stick for now to the release version and if I
have problem parsing I will give the branch jars a try the reason I am
looking for upgrading tika is because tika keeps improving on things like
languages, mime type support, and so on 

thanks again

JO
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-tp2570526p2576658.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Darx Oman
hi
if you want to index pdf files then use tika 0.6
because 0.7 and 0.8 does not detect the correctly the pdfParse


Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Andreas Kemkes
According to the Tika release notes, it's fixed in 0.9.  Haven't tried it 
myself.

A critical backwards incompatible bug in PDF parsing that was introduced in 
Tika 
0.8 has been fixed. (TIKA-548)

Andreas




From: Darx Oman darxo...@gmail.com
To: solr-user@lucene.apache.org
Sent: Fri, February 25, 2011 10:33:39 AM
Subject: Re: upgrading to Tika 0.9 on Solr 1.4.1

hi
if you want to index pdf files then use tika 0.6
because 0.7 and 0.8 does not detect the correctly the pdfParse



  

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Mattmann, Chris A (388J)
Yep it's fixed in 0.9.

Cheers,
Chris

On Feb 25, 2011, at 2:37 PM, Andreas Kemkes wrote:

 According to the Tika release notes, it's fixed in 0.9.  Haven't tried it 
 myself.
 
 A critical backwards incompatible bug in PDF parsing that was introduced in 
 Tika 
 0.8 has been fixed. (TIKA-548)
 
 Andreas
 
 
 
 
 From: Darx Oman darxo...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, February 25, 2011 10:33:39 AM
 Subject: Re: upgrading to Tika 0.9 on Solr 1.4.1
 
 hi
 if you want to index pdf files then use tika 0.6
 because 0.7 and 0.8 does not detect the correctly the pdfParse
 
 
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++