RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much, Jan. I use curl to index pdf files. Is there other way to do it?

I changed it the positionIncrement to 0, I didn't get it work either.

Thanks,
Xiaohui 

-Original Message-
From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] 
Sent: Wednesday, December 01, 2010 2:34 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

I don't know about upload limitations, but for sure there are some in  
the default settings, this could explain the limit of 20MB. Which  
upload mechanism on solr side do you use? I guess this is not a lucene  
problem but rather the http-layer of solr.

If you manage to stream your PDF and start parsing it on the stream  
you then should go for the filter, that sets the positionIncrement to  
0 as mentioned.

What we did once for PDF files, we parsed them befor into plain text  
and where indexing this (but we were using lucene directly) with a  
streamReader.


Grüße, Jan

Am 01.12.2010 um 18:13 schrieb "ext Ma, Xiaohui (NIH/NLM/LHC) [C]" 
:

> Thanks so much for your replay, Jan. I just found I cannot index pdf  
> files with the file size more than 20MB.
>
> I use curl index them, didn't get any error either. Do you have any  
> suggestions to index pdf files with more than 20MB?
>
> Thanks,
> Xiaohui
>
> -Original Message-
> From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com]
> Sent: Wednesday, December 01, 2010 11:30 AM
> To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
> solr-user-...@lucene.apache.org
> Subject: RE: how to set maxFieldLength to unlimitd
>
> You just can't set it to "unlimited". What you could do, is ignoring  
> the positions and put a filter in, that sets the token for all but  
> the first token to 0 (means the field length will be just 1, all  
> tokens "stacked" on the first position)
> You could also break per page, so you put each "page" on a new  
> position.
>
> Jan
>
>> -Original Message-
>> From: ext Ma, Xiaohui (NIH/NLM/LHC) [C]  
>> [mailto:xiao...@mail.nlm.nih.gov]
>> Sent: Dienstag, 30. November 2010 19:49
>> To: solr-user@lucene.apache.org; 'solr-user- 
>> i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org'
>> Subject: how to set maxFieldLength to unlimitd
>>
>> I need index and search some pdf files which are very big (around  
>> 1000 pages each). How can I set maxFieldLength to unlimited?
>>
>> Thanks so much for your help in advance,
>> Xiaohui


Re: how to set maxFieldLength to unlimitd

2010-12-01 Thread jan.kurella
I don't know about upload limitations, but for sure there are some in  
the default settings, this could explain the limit of 20MB. Which  
upload mechanism on solr side do you use? I guess this is not a lucene  
problem but rather the http-layer of solr.

If you manage to stream your PDF and start parsing it on the stream  
you then should go for the filter, that sets the positionIncrement to  
0 as mentioned.

What we did once for PDF files, we parsed them befor into plain text  
and where indexing this (but we were using lucene directly) with a  
streamReader.


Grüße, Jan

Am 01.12.2010 um 18:13 schrieb "ext Ma, Xiaohui (NIH/NLM/LHC) [C]" 
:

> Thanks so much for your replay, Jan. I just found I cannot index pdf  
> files with the file size more than 20MB.
>
> I use curl index them, didn't get any error either. Do you have any  
> suggestions to index pdf files with more than 20MB?
>
> Thanks,
> Xiaohui
>
> -Original Message-
> From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com]
> Sent: Wednesday, December 01, 2010 11:30 AM
> To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
> solr-user-...@lucene.apache.org
> Subject: RE: how to set maxFieldLength to unlimitd
>
> You just can't set it to "unlimited". What you could do, is ignoring  
> the positions and put a filter in, that sets the token for all but  
> the first token to 0 (means the field length will be just 1, all  
> tokens "stacked" on the first position)
> You could also break per page, so you put each "page" on a new  
> position.
>
> Jan
>
>> -Original Message-
>> From: ext Ma, Xiaohui (NIH/NLM/LHC) [C]  
>> [mailto:xiao...@mail.nlm.nih.gov]
>> Sent: Dienstag, 30. November 2010 19:49
>> To: solr-user@lucene.apache.org; 'solr-user- 
>> i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org'
>> Subject: how to set maxFieldLength to unlimitd
>>
>> I need index and search some pdf files which are very big (around  
>> 1000 pages each). How can I set maxFieldLength to unlimited?
>>
>> Thanks so much for your help in advance,
>> Xiaohui


RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your replay, Jan. I just found I cannot index pdf files with 
the file size more than 20MB.

I use curl index them, didn't get any error either. Do you have any suggestions 
to index pdf files with more than 20MB?

Thanks,
Xiaohui 

-Original Message-
From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] 
Sent: Wednesday, December 01, 2010 11:30 AM
To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
solr-user-...@lucene.apache.org
Subject: RE: how to set maxFieldLength to unlimitd

You just can't set it to "unlimited". What you could do, is ignoring the 
positions and put a filter in, that sets the token for all but the first token 
to 0 (means the field length will be just 1, all tokens "stacked" on the first 
position)
You could also break per page, so you put each "page" on a new position.

Jan

>-Original Message-
>From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov]
>Sent: Dienstag, 30. November 2010 19:49
>To: solr-user@lucene.apache.org; 'solr-user-i...@lucene.apache.org'; 
>'solr-user-...@lucene.apache.org'
>Subject: how to set maxFieldLength to unlimitd
>
>I need index and search some pdf files which are very big (around 1000 pages 
>each). How can I set maxFieldLength to unlimited?
>
>Thanks so much for your help in advance,
>Xiaohui


RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread jan.kurella
You just can't set it to "unlimited". What you could do, is ignoring the 
positions and put a filter in, that sets the token for all but the first token 
to 0 (means the field length will be just 1, all tokens "stacked" on the first 
position)
You could also break per page, so you put each "page" on a new position.

Jan

>-Original Message-
>From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov]
>Sent: Dienstag, 30. November 2010 19:49
>To: solr-user@lucene.apache.org; 'solr-user-i...@lucene.apache.org'; 
>'solr-user-...@lucene.apache.org'
>Subject: how to set maxFieldLength to unlimitd
>
>I need index and search some pdf files which are very big (around 1000 pages 
>each). How can I set maxFieldLength to unlimited?
>
>Thanks so much for your help in advance,
>Xiaohui


RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Does anyone know how to index a pdf file with very big size (more than 100MB)?

Thanks so much,
Xiaohui 
-Original Message-
From: Ma, Xiaohui (NIH/NLM/LHC) [C] 
Sent: Tuesday, November 30, 2010 4:22 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: how to set maxFieldLength to unlimitd

I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files 
again. I also commented out the one in the  section. Unfortunately 
the files are still chopped out if the size of file is more than 20MB.

Any suggestions? I really appreciate your help!
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

Set the  value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the  section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:

> I need index and search some pdf files which are very big (around 1000
> pages each). How can I set maxFieldLength to unlimited?
>
> Thanks so much for your help in advance,
> Xiaohui
>


RE: how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files 
again. I also commented out the one in the  section. Unfortunately 
the files are still chopped out if the size of file is more than 20MB.

Any suggestions? I really appreciate your help!
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

Set the  value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the  section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:

> I need index and search some pdf files which are very big (around 1000
> pages each). How can I set maxFieldLength to unlimited?
>
> Thanks so much for your help in advance,
> Xiaohui
>


RE: how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help!
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

Set the  value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the  section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:

> I need index and search some pdf files which are very big (around 1000
> pages each). How can I set maxFieldLength to unlimited?
>
> Thanks so much for your help in advance,
> Xiaohui
>


Re: how to set maxFieldLength to unlimitd

2010-11-30 Thread Erick Erickson
Set the  value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the  section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:

> I need index and search some pdf files which are very big (around 1000
> pages each). How can I set maxFieldLength to unlimited?
>
> Thanks so much for your help in advance,
> Xiaohui
>


how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I need index and search some pdf files which are very big (around 1000 pages 
each). How can I set maxFieldLength to unlimited?

Thanks so much for your help in advance,
Xiaohui