RE: Solr Exception

2011-04-10 Thread rahul
hi,

Try adding the 'pdfbox and fontbox' jars in to your solr lib folder and try
restart Tomcat once.

I hope it will solve the issue.

Thanks. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Exception-tp2654719p2798609.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Solr Exception

2011-04-09 Thread rahul
hi,

Try adding the 'pdfbox and fontbox' jars in to your solr lib folder and try
restart Tomcat once.

I hope it will solve the issue.

Thanks. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Exception-tp2654719p2798619.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr Exception

2011-03-09 Thread Tommaso Teofili
Did you double check the  element in your solrconfig.xml which points
to the Tika jar you're using?

2011/3/9 Deepak Singh 

>
> downloaded apache-solr-3.1 still it giving TIKA Exception
>
>
> On Wed, Mar 9, 2011 at 5:11 PM, Deepak Singh wrote:
>
>> oh, thanks for the better solution.
>>
>>
>> On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler  wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> These are all bugs in Apache TIKA not Solr, some of them are already
>>> fixed in later TIKA versions (so you may try the soon-to-be-released Solr
>>> 3.1 version which contains a newer TIKA bundled).
>>>
>>>
>>>
>>> Uwe
>>>
>>>
>>>
>>> -
>>>
>>> Uwe Schindler
>>>
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>
>>> http://www.thetaphi.de
>>>
>>> eMail: u...@thetaphi.de
>>>
>>>
>>>
>>> *From:* Deepak Singh [mailto:deep...@praumtech.com]
>>> *Sent:* Wednesday, March 09, 2011 12:03 PM
>>> *To:* dev@lucene.apache.org
>>> *Subject:* Re: Solr Exception
>>>
>>>
>>>
>>>
>>> *HTTP ERROR :500 (INTERNAL SERVER ERROR)*
>>>
>>> *For DOC files:*
>>> org.apache.tika.exception.
>>>
>>> TikaException :
>>> -Unexpected RuntimeException from
>>> org.apache.tika.parser.microsoft.OfficeParser@1248f2
>>> Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The
>>> property set claims to have a size of 16 bytes. However, it exceeds 16
>>> bytes.
>>>
>>> -TIKA-198: Illegal IOException from
>>> org.apache.tika.parser.microsoft.OfficeParser@1248f2
>>> Caused by: java.io.IOException: block[ 0 ] already removed - does your
>>> POIFS have circular or duplicate block references?
>>>
>>>
>>> *For PDF files:*
>>> org.apache.tika.exception.TikaException :
>>> -Unexpected RuntimeException from
>>> org.apache.tika.parser.Pdfparser@1b4cd65
>>> Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot
>>> be cast to org.pdfbox.cos.COSDictionar
>>> Caused by: java.lang.NullPointerException
>>>
>>>
>>>
>>> -Unable to extract PDF content
>>>
>>> *HTTP ERROR:400 (BAD REQUEST)*
>>> -This error come when some fields are missing
>>> ERROR:unknown field 'language' (Ex:content_status, description,version)
>>>
>>>
>>>
>>> On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty  wrote:
>>>
>>> Hi,
>>>
>>> This is probably better directed to the user list. Also, please provide
>>> details of the exceptions from your log files.
>>>
>>> Regards,
>>> Gora
>>>
>>>
>>>
>>
>>
>


RE: Solr Exception

2011-03-09 Thread Uwe Schindler
Then you should open a bug report on TIKA, providing them your files that do
not parse. Often the problem is in some of TIKA's underlying parser libs
like Apache POI, then there is nothing they can do. Maybe another TIKA issue
handles about the same problem, just search the issue tracker!

 

Uwe

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Deepak Singh [mailto:deep...@praumtech.com] 
Sent: Wednesday, March 09, 2011 2:09 PM
To: dev@lucene.apache.org
Subject: Re: Solr Exception

 


downloaded apache-solr-3.1 still it giving TIKA Exception

On Wed, Mar 9, 2011 at 5:11 PM, Deepak Singh  wrote:

oh, thanks for the better solution.

 

On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler  wrote:

Hi,

 

These are all bugs in Apache TIKA not Solr, some of them are already fixed
in later TIKA versions (so you may try the soon-to-be-released Solr 3.1
version which contains a newer TIKA bundled).

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: u...@thetaphi.de

 

From: Deepak Singh [mailto:deep...@praumtech.com] 
Sent: Wednesday, March 09, 2011 12:03 PM
To: dev@lucene.apache.org
Subject: Re: Solr Exception

 


HTTP ERROR :500 (INTERNAL SERVER ERROR)

For DOC files:
org.apache.tika.exception.

TikaException :
-Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The property
set claims to have a size of 16 bytes. However, it exceeds 16 bytes.

-TIKA-198: Illegal IOException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: java.io.IOException: block[ 0 ] already removed - does your POIFS
have circular or duplicate block references?


For PDF files:
org.apache.tika.exception.TikaException : 
-Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65
Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be
cast to org.pdfbox.cos.COSDictionar
Caused by: java.lang.NullPointerException

 

-Unable to extract PDF content

HTTP ERROR:400 (BAD REQUEST)
-This error come when some fields are missing
ERROR:unknown field 'language' (Ex:content_status, description,version)

 

On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty  wrote:

Hi,

This is probably better directed to the user list. Also, please provide
details of the exceptions from your log files.

Regards,
Gora

 

 

 



Re: Solr Exception

2011-03-09 Thread Deepak Singh
downloaded apache-solr-3.1 still it giving TIKA Exception

On Wed, Mar 9, 2011 at 5:11 PM, Deepak Singh  wrote:

> oh, thanks for the better solution.
>
>
> On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler  wrote:
>
>> Hi,
>>
>>
>>
>> These are all bugs in Apache TIKA not Solr, some of them are already fixed
>> in later TIKA versions (so you may try the soon-to-be-released Solr 3.1
>> version which contains a newer TIKA bundled).
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>
>> http://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Deepak Singh [mailto:deep...@praumtech.com]
>> *Sent:* Wednesday, March 09, 2011 12:03 PM
>> *To:* dev@lucene.apache.org
>> *Subject:* Re: Solr Exception
>>
>>
>>
>>
>> *HTTP ERROR :500 (INTERNAL SERVER ERROR)*
>>
>> *For DOC files:*
>> org.apache.tika.exception.
>>
>> TikaException :
>> -Unexpected RuntimeException from
>> org.apache.tika.parser.microsoft.OfficeParser@1248f2
>> Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The
>> property set claims to have a size of 16 bytes. However, it exceeds 16
>> bytes.
>>
>> -TIKA-198: Illegal IOException from
>> org.apache.tika.parser.microsoft.OfficeParser@1248f2
>> Caused by: java.io.IOException: block[ 0 ] already removed - does your
>> POIFS have circular or duplicate block references?
>>
>>
>> *For PDF files:*
>> org.apache.tika.exception.TikaException :
>> -Unexpected RuntimeException from
>> org.apache.tika.parser.Pdfparser@1b4cd65
>> Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be
>> cast to org.pdfbox.cos.COSDictionar
>> Caused by: java.lang.NullPointerException
>>
>>
>>
>> -Unable to extract PDF content
>>
>> *HTTP ERROR:400 (BAD REQUEST)*
>> -This error come when some fields are missing
>> ERROR:unknown field 'language' (Ex:content_status, description,version)
>>
>>
>>
>> On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty  wrote:
>>
>> Hi,
>>
>> This is probably better directed to the user list. Also, please provide
>> details of the exceptions from your log files.
>>
>> Regards,
>> Gora
>>
>>
>>
>
>


Re: Solr Exception

2011-03-09 Thread Deepak Singh
oh, thanks for the better solution.

On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler  wrote:

> Hi,
>
>
>
> These are all bugs in Apache TIKA not Solr, some of them are already fixed
> in later TIKA versions (so you may try the soon-to-be-released Solr 3.1
> version which contains a newer TIKA bundled).
>
>
>
> Uwe
>
>
>
> -
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* Deepak Singh [mailto:deep...@praumtech.com]
> *Sent:* Wednesday, March 09, 2011 12:03 PM
> *To:* dev@lucene.apache.org
> *Subject:* Re: Solr Exception
>
>
>
>
> *HTTP ERROR :500 (INTERNAL SERVER ERROR)*
>
> *For DOC files:*
> org.apache.tika.exception.
>
> TikaException :
> -Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@1248f2
> Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The
> property set claims to have a size of 16 bytes. However, it exceeds 16
> bytes.
>
> -TIKA-198: Illegal IOException from
> org.apache.tika.parser.microsoft.OfficeParser@1248f2
> Caused by: java.io.IOException: block[ 0 ] already removed - does your
> POIFS have circular or duplicate block references?
>
>
> *For PDF files:*
> org.apache.tika.exception.TikaException :
> -Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65
> Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be
> cast to org.pdfbox.cos.COSDictionar
> Caused by: java.lang.NullPointerException
>
>
>
> -Unable to extract PDF content
>
> *HTTP ERROR:400 (BAD REQUEST)*
> -This error come when some fields are missing
> ERROR:unknown field 'language' (Ex:content_status, description,version)
>
>
>
> On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty  wrote:
>
> Hi,
>
> This is probably better directed to the user list. Also, please provide
> details of the exceptions from your log files.
>
> Regards,
> Gora
>
>
>


RE: Solr Exception

2011-03-09 Thread Uwe Schindler
Hi,

 

These are all bugs in Apache TIKA not Solr, some of them are already fixed
in later TIKA versions (so you may try the soon-to-be-released Solr 3.1
version which contains a newer TIKA bundled).

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Deepak Singh [mailto:deep...@praumtech.com] 
Sent: Wednesday, March 09, 2011 12:03 PM
To: dev@lucene.apache.org
Subject: Re: Solr Exception

 


HTTP ERROR :500 (INTERNAL SERVER ERROR)

For DOC files:
org.apache.tika.exception.

TikaException :
-Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The property
set claims to have a size of 16 bytes. However, it exceeds 16 bytes.

-TIKA-198: Illegal IOException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: java.io.IOException: block[ 0 ] already removed - does your POIFS
have circular or duplicate block references?


For PDF files:
org.apache.tika.exception.TikaException : 
-Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65
Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be
cast to org.pdfbox.cos.COSDictionar
Caused by: java.lang.NullPointerException

 

-Unable to extract PDF content

HTTP ERROR:400 (BAD REQUEST)
-This error come when some fields are missing
ERROR:unknown field 'language' (Ex:content_status, description,version)

 

On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty  wrote:

Hi,

This is probably better directed to the user list. Also, please provide
details of the exceptions from your log files.

Regards,
Gora

 



Re: Solr Exception

2011-03-09 Thread Deepak Singh
*HTTP ERROR :500 (INTERNAL SERVER ERROR)*

*For DOC files:*
org.apache.tika.exception.
TikaException :
-Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The property
set claims to have a size of 16 bytes. However, it exceeds 16 bytes.

-TIKA-198: Illegal IOException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: java.io.IOException: block[ 0 ] already removed - does your POIFS
have circular or duplicate block references?


*For PDF files:*
org.apache.tika.exception.TikaException :
-Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65
Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be
cast to org.pdfbox.cos.COSDictionar
Caused by: java.lang.NullPointerException


-Unable to extract PDF content

*HTTP ERROR:400 (BAD REQUEST)*
-This error come when some fields are missing
ERROR:unknown field 'language' (Ex:content_status, description,version)


On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty  wrote:

> Hi,
>
> This is probably better directed to the user list. Also, please provide
> details of the exceptions from your log files.
>
> Regards,
> Gora
>


Re: Solr Exception

2011-03-09 Thread Gora Mohanty
Hi,

This is probably better directed to the user list. Also, please provide
details of the exceptions from your log files.

Regards,
Gora


Solr Exception

2011-03-09 Thread Deepak Singh
Hi

While indexing document(doc, docx, pdf) i m getting exception 500 and 400