RE: Solr Exception
hi, Try adding the 'pdfbox and fontbox' jars in to your solr lib folder and try restart Tomcat once. I hope it will solve the issue. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Exception-tp2654719p2798609.html Sent from the Solr - Dev mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Solr Exception
hi, Try adding the 'pdfbox and fontbox' jars in to your solr lib folder and try restart Tomcat once. I hope it will solve the issue. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Exception-tp2654719p2798619.html Sent from the Solr - Dev mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Solr Exception
Did you double check the element in your solrconfig.xml which points to the Tika jar you're using? 2011/3/9 Deepak Singh > > downloaded apache-solr-3.1 still it giving TIKA Exception > > > On Wed, Mar 9, 2011 at 5:11 PM, Deepak Singh wrote: > >> oh, thanks for the better solution. >> >> >> On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler wrote: >> >>> Hi, >>> >>> >>> >>> These are all bugs in Apache TIKA not Solr, some of them are already >>> fixed in later TIKA versions (so you may try the soon-to-be-released Solr >>> 3.1 version which contains a newer TIKA bundled). >>> >>> >>> >>> Uwe >>> >>> >>> >>> - >>> >>> Uwe Schindler >>> >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> >>> http://www.thetaphi.de >>> >>> eMail: u...@thetaphi.de >>> >>> >>> >>> *From:* Deepak Singh [mailto:deep...@praumtech.com] >>> *Sent:* Wednesday, March 09, 2011 12:03 PM >>> *To:* dev@lucene.apache.org >>> *Subject:* Re: Solr Exception >>> >>> >>> >>> >>> *HTTP ERROR :500 (INTERNAL SERVER ERROR)* >>> >>> *For DOC files:* >>> org.apache.tika.exception. >>> >>> TikaException : >>> -Unexpected RuntimeException from >>> org.apache.tika.parser.microsoft.OfficeParser@1248f2 >>> Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The >>> property set claims to have a size of 16 bytes. However, it exceeds 16 >>> bytes. >>> >>> -TIKA-198: Illegal IOException from >>> org.apache.tika.parser.microsoft.OfficeParser@1248f2 >>> Caused by: java.io.IOException: block[ 0 ] already removed - does your >>> POIFS have circular or duplicate block references? >>> >>> >>> *For PDF files:* >>> org.apache.tika.exception.TikaException : >>> -Unexpected RuntimeException from >>> org.apache.tika.parser.Pdfparser@1b4cd65 >>> Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot >>> be cast to org.pdfbox.cos.COSDictionar >>> Caused by: java.lang.NullPointerException >>> >>> >>> >>> -Unable to extract PDF content >>> >>> *HTTP ERROR:400 (BAD REQUEST)* >>> -This error come when some fields are missing >>> ERROR:unknown field 'language' (Ex:content_status, description,version) >>> >>> >>> >>> On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty wrote: >>> >>> Hi, >>> >>> This is probably better directed to the user list. Also, please provide >>> details of the exceptions from your log files. >>> >>> Regards, >>> Gora >>> >>> >>> >> >> >
RE: Solr Exception
Then you should open a bug report on TIKA, providing them your files that do not parse. Often the problem is in some of TIKA's underlying parser libs like Apache POI, then there is nothing they can do. Maybe another TIKA issue handles about the same problem, just search the issue tracker! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen <http://www.thetaphi.de/> http://www.thetaphi.de eMail: u...@thetaphi.de From: Deepak Singh [mailto:deep...@praumtech.com] Sent: Wednesday, March 09, 2011 2:09 PM To: dev@lucene.apache.org Subject: Re: Solr Exception downloaded apache-solr-3.1 still it giving TIKA Exception On Wed, Mar 9, 2011 at 5:11 PM, Deepak Singh wrote: oh, thanks for the better solution. On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler wrote: Hi, These are all bugs in Apache TIKA not Solr, some of them are already fixed in later TIKA versions (so you may try the soon-to-be-released Solr 3.1 version which contains a newer TIKA bundled). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de <http://www.thetaphi.de/> eMail: u...@thetaphi.de From: Deepak Singh [mailto:deep...@praumtech.com] Sent: Wednesday, March 09, 2011 12:03 PM To: dev@lucene.apache.org Subject: Re: Solr Exception HTTP ERROR :500 (INTERNAL SERVER ERROR) For DOC files: org.apache.tika.exception. TikaException : -Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@1248f2 Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The property set claims to have a size of 16 bytes. However, it exceeds 16 bytes. -TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@1248f2 Caused by: java.io.IOException: block[ 0 ] already removed - does your POIFS have circular or duplicate block references? For PDF files: org.apache.tika.exception.TikaException : -Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65 Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be cast to org.pdfbox.cos.COSDictionar Caused by: java.lang.NullPointerException -Unable to extract PDF content HTTP ERROR:400 (BAD REQUEST) -This error come when some fields are missing ERROR:unknown field 'language' (Ex:content_status, description,version) On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty wrote: Hi, This is probably better directed to the user list. Also, please provide details of the exceptions from your log files. Regards, Gora
Re: Solr Exception
downloaded apache-solr-3.1 still it giving TIKA Exception On Wed, Mar 9, 2011 at 5:11 PM, Deepak Singh wrote: > oh, thanks for the better solution. > > > On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler wrote: > >> Hi, >> >> >> >> These are all bugs in Apache TIKA not Solr, some of them are already fixed >> in later TIKA versions (so you may try the soon-to-be-released Solr 3.1 >> version which contains a newer TIKA bundled). >> >> >> >> Uwe >> >> >> >> - >> >> Uwe Schindler >> >> H.-H.-Meier-Allee 63, D-28213 Bremen >> >> http://www.thetaphi.de >> >> eMail: u...@thetaphi.de >> >> >> >> *From:* Deepak Singh [mailto:deep...@praumtech.com] >> *Sent:* Wednesday, March 09, 2011 12:03 PM >> *To:* dev@lucene.apache.org >> *Subject:* Re: Solr Exception >> >> >> >> >> *HTTP ERROR :500 (INTERNAL SERVER ERROR)* >> >> *For DOC files:* >> org.apache.tika.exception. >> >> TikaException : >> -Unexpected RuntimeException from >> org.apache.tika.parser.microsoft.OfficeParser@1248f2 >> Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The >> property set claims to have a size of 16 bytes. However, it exceeds 16 >> bytes. >> >> -TIKA-198: Illegal IOException from >> org.apache.tika.parser.microsoft.OfficeParser@1248f2 >> Caused by: java.io.IOException: block[ 0 ] already removed - does your >> POIFS have circular or duplicate block references? >> >> >> *For PDF files:* >> org.apache.tika.exception.TikaException : >> -Unexpected RuntimeException from >> org.apache.tika.parser.Pdfparser@1b4cd65 >> Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be >> cast to org.pdfbox.cos.COSDictionar >> Caused by: java.lang.NullPointerException >> >> >> >> -Unable to extract PDF content >> >> *HTTP ERROR:400 (BAD REQUEST)* >> -This error come when some fields are missing >> ERROR:unknown field 'language' (Ex:content_status, description,version) >> >> >> >> On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty wrote: >> >> Hi, >> >> This is probably better directed to the user list. Also, please provide >> details of the exceptions from your log files. >> >> Regards, >> Gora >> >> >> > >
Re: Solr Exception
oh, thanks for the better solution. On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler wrote: > Hi, > > > > These are all bugs in Apache TIKA not Solr, some of them are already fixed > in later TIKA versions (so you may try the soon-to-be-released Solr 3.1 > version which contains a newer TIKA bundled). > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > *From:* Deepak Singh [mailto:deep...@praumtech.com] > *Sent:* Wednesday, March 09, 2011 12:03 PM > *To:* dev@lucene.apache.org > *Subject:* Re: Solr Exception > > > > > *HTTP ERROR :500 (INTERNAL SERVER ERROR)* > > *For DOC files:* > org.apache.tika.exception. > > TikaException : > -Unexpected RuntimeException from > org.apache.tika.parser.microsoft.OfficeParser@1248f2 > Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The > property set claims to have a size of 16 bytes. However, it exceeds 16 > bytes. > > -TIKA-198: Illegal IOException from > org.apache.tika.parser.microsoft.OfficeParser@1248f2 > Caused by: java.io.IOException: block[ 0 ] already removed - does your > POIFS have circular or duplicate block references? > > > *For PDF files:* > org.apache.tika.exception.TikaException : > -Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65 > Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be > cast to org.pdfbox.cos.COSDictionar > Caused by: java.lang.NullPointerException > > > > -Unable to extract PDF content > > *HTTP ERROR:400 (BAD REQUEST)* > -This error come when some fields are missing > ERROR:unknown field 'language' (Ex:content_status, description,version) > > > > On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty wrote: > > Hi, > > This is probably better directed to the user list. Also, please provide > details of the exceptions from your log files. > > Regards, > Gora > > >
RE: Solr Exception
Hi, These are all bugs in Apache TIKA not Solr, some of them are already fixed in later TIKA versions (so you may try the soon-to-be-released Solr 3.1 version which contains a newer TIKA bundled). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen <http://www.thetaphi.de/> http://www.thetaphi.de eMail: u...@thetaphi.de From: Deepak Singh [mailto:deep...@praumtech.com] Sent: Wednesday, March 09, 2011 12:03 PM To: dev@lucene.apache.org Subject: Re: Solr Exception HTTP ERROR :500 (INTERNAL SERVER ERROR) For DOC files: org.apache.tika.exception. TikaException : -Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@1248f2 Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The property set claims to have a size of 16 bytes. However, it exceeds 16 bytes. -TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@1248f2 Caused by: java.io.IOException: block[ 0 ] already removed - does your POIFS have circular or duplicate block references? For PDF files: org.apache.tika.exception.TikaException : -Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65 Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be cast to org.pdfbox.cos.COSDictionar Caused by: java.lang.NullPointerException -Unable to extract PDF content HTTP ERROR:400 (BAD REQUEST) -This error come when some fields are missing ERROR:unknown field 'language' (Ex:content_status, description,version) On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty wrote: Hi, This is probably better directed to the user list. Also, please provide details of the exceptions from your log files. Regards, Gora
Re: Solr Exception
*HTTP ERROR :500 (INTERNAL SERVER ERROR)* *For DOC files:* org.apache.tika.exception. TikaException : -Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@1248f2 Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The property set claims to have a size of 16 bytes. However, it exceeds 16 bytes. -TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@1248f2 Caused by: java.io.IOException: block[ 0 ] already removed - does your POIFS have circular or duplicate block references? *For PDF files:* org.apache.tika.exception.TikaException : -Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65 Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be cast to org.pdfbox.cos.COSDictionar Caused by: java.lang.NullPointerException -Unable to extract PDF content *HTTP ERROR:400 (BAD REQUEST)* -This error come when some fields are missing ERROR:unknown field 'language' (Ex:content_status, description,version) On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty wrote: > Hi, > > This is probably better directed to the user list. Also, please provide > details of the exceptions from your log files. > > Regards, > Gora >
Re: Solr Exception
Hi, This is probably better directed to the user list. Also, please provide details of the exceptions from your log files. Regards, Gora
Solr Exception
Hi While indexing document(doc, docx, pdf) i m getting exception 500 and 400