RE: how to set maxFieldLength to unlimitd
Thanks so much, Jan. I use curl to index pdf files. Is there other way to do it? I changed it the positionIncrement to 0, I didn't get it work either. Thanks, Xiaohui -Original Message- From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] Sent: Wednesday, December 01, 2010 2:34 PM To: solr-user@lucene.apache.org Subject: Re: how to set maxFieldLength to unlimitd I don't know about upload limitations, but for sure there are some in the default settings, this could explain the limit of 20MB. Which upload mechanism on solr side do you use? I guess this is not a lucene problem but rather the http-layer of solr. If you manage to stream your PDF and start parsing it on the stream you then should go for the filter, that sets the positionIncrement to 0 as mentioned. What we did once for PDF files, we parsed them befor into plain text and where indexing this (but we were using lucene directly) with a streamReader. Grüße, Jan Am 01.12.2010 um 18:13 schrieb "ext Ma, Xiaohui (NIH/NLM/LHC) [C]" : > Thanks so much for your replay, Jan. I just found I cannot index pdf > files with the file size more than 20MB. > > I use curl index them, didn't get any error either. Do you have any > suggestions to index pdf files with more than 20MB? > > Thanks, > Xiaohui > > -Original Message- > From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] > Sent: Wednesday, December 01, 2010 11:30 AM > To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; > solr-user-...@lucene.apache.org > Subject: RE: how to set maxFieldLength to unlimitd > > You just can't set it to "unlimited". What you could do, is ignoring > the positions and put a filter in, that sets the token for all but > the first token to 0 (means the field length will be just 1, all > tokens "stacked" on the first position) > You could also break per page, so you put each "page" on a new > position. > > Jan > >> -Original Message- >> From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] >> [mailto:xiao...@mail.nlm.nih.gov] >> Sent: Dienstag, 30. November 2010 19:49 >> To: solr-user@lucene.apache.org; 'solr-user- >> i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org' >> Subject: how to set maxFieldLength to unlimitd >> >> I need index and search some pdf files which are very big (around >> 1000 pages each). How can I set maxFieldLength to unlimited? >> >> Thanks so much for your help in advance, >> Xiaohui
Re: how to set maxFieldLength to unlimitd
I don't know about upload limitations, but for sure there are some in the default settings, this could explain the limit of 20MB. Which upload mechanism on solr side do you use? I guess this is not a lucene problem but rather the http-layer of solr. If you manage to stream your PDF and start parsing it on the stream you then should go for the filter, that sets the positionIncrement to 0 as mentioned. What we did once for PDF files, we parsed them befor into plain text and where indexing this (but we were using lucene directly) with a streamReader. Grüße, Jan Am 01.12.2010 um 18:13 schrieb "ext Ma, Xiaohui (NIH/NLM/LHC) [C]" : > Thanks so much for your replay, Jan. I just found I cannot index pdf > files with the file size more than 20MB. > > I use curl index them, didn't get any error either. Do you have any > suggestions to index pdf files with more than 20MB? > > Thanks, > Xiaohui > > -Original Message- > From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] > Sent: Wednesday, December 01, 2010 11:30 AM > To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; > solr-user-...@lucene.apache.org > Subject: RE: how to set maxFieldLength to unlimitd > > You just can't set it to "unlimited". What you could do, is ignoring > the positions and put a filter in, that sets the token for all but > the first token to 0 (means the field length will be just 1, all > tokens "stacked" on the first position) > You could also break per page, so you put each "page" on a new > position. > > Jan > >> -Original Message- >> From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] >> [mailto:xiao...@mail.nlm.nih.gov] >> Sent: Dienstag, 30. November 2010 19:49 >> To: solr-user@lucene.apache.org; 'solr-user- >> i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org' >> Subject: how to set maxFieldLength to unlimitd >> >> I need index and search some pdf files which are very big (around >> 1000 pages each). How can I set maxFieldLength to unlimited? >> >> Thanks so much for your help in advance, >> Xiaohui
RE: how to set maxFieldLength to unlimitd
Thanks so much for your replay, Jan. I just found I cannot index pdf files with the file size more than 20MB. I use curl index them, didn't get any error either. Do you have any suggestions to index pdf files with more than 20MB? Thanks, Xiaohui -Original Message- From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] Sent: Wednesday, December 01, 2010 11:30 AM To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; solr-user-...@lucene.apache.org Subject: RE: how to set maxFieldLength to unlimitd You just can't set it to "unlimited". What you could do, is ignoring the positions and put a filter in, that sets the token for all but the first token to 0 (means the field length will be just 1, all tokens "stacked" on the first position) You could also break per page, so you put each "page" on a new position. Jan >-Original Message- >From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov] >Sent: Dienstag, 30. November 2010 19:49 >To: solr-user@lucene.apache.org; 'solr-user-i...@lucene.apache.org'; >'solr-user-...@lucene.apache.org' >Subject: how to set maxFieldLength to unlimitd > >I need index and search some pdf files which are very big (around 1000 pages >each). How can I set maxFieldLength to unlimited? > >Thanks so much for your help in advance, >Xiaohui
RE: how to set maxFieldLength to unlimitd
You just can't set it to "unlimited". What you could do, is ignoring the positions and put a filter in, that sets the token for all but the first token to 0 (means the field length will be just 1, all tokens "stacked" on the first position) You could also break per page, so you put each "page" on a new position. Jan >-Original Message- >From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov] >Sent: Dienstag, 30. November 2010 19:49 >To: solr-user@lucene.apache.org; 'solr-user-i...@lucene.apache.org'; >'solr-user-...@lucene.apache.org' >Subject: how to set maxFieldLength to unlimitd > >I need index and search some pdf files which are very big (around 1000 pages >each). How can I set maxFieldLength to unlimited? > >Thanks so much for your help in advance, >Xiaohui
RE: how to set maxFieldLength to unlimitd
Does anyone know how to index a pdf file with very big size (more than 100MB)? Thanks so much, Xiaohui -Original Message- From: Ma, Xiaohui (NIH/NLM/LHC) [C] Sent: Tuesday, November 30, 2010 4:22 PM To: 'solr-user@lucene.apache.org' Subject: RE: how to set maxFieldLength to unlimitd I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files again. I also commented out the one in the section. Unfortunately the files are still chopped out if the size of file is more than 20MB. Any suggestions? I really appreciate your help! Xiaohui -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 30, 2010 2:01 PM To: solr-user@lucene.apache.org Subject: Re: how to set maxFieldLength to unlimitd Set the value in solrconfig.xml to, say, 2147483647 Also, see this thread for a common gotcha: http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html , it appears you can just comment out the one in the section. Best Erick On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] < xiao...@mail.nlm.nih.gov> wrote: > I need index and search some pdf files which are very big (around 1000 > pages each). How can I set maxFieldLength to unlimited? > > Thanks so much for your help in advance, > Xiaohui >
RE: how to set maxFieldLength to unlimitd
I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files again. I also commented out the one in the section. Unfortunately the files are still chopped out if the size of file is more than 20MB. Any suggestions? I really appreciate your help! Xiaohui -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 30, 2010 2:01 PM To: solr-user@lucene.apache.org Subject: Re: how to set maxFieldLength to unlimitd Set the value in solrconfig.xml to, say, 2147483647 Also, see this thread for a common gotcha: http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html , it appears you can just comment out the one in the section. Best Erick On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] < xiao...@mail.nlm.nih.gov> wrote: > I need index and search some pdf files which are very big (around 1000 > pages each). How can I set maxFieldLength to unlimited? > > Thanks so much for your help in advance, > Xiaohui >
RE: how to set maxFieldLength to unlimitd
Thanks so much for your help! Xiaohui -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 30, 2010 2:01 PM To: solr-user@lucene.apache.org Subject: Re: how to set maxFieldLength to unlimitd Set the value in solrconfig.xml to, say, 2147483647 Also, see this thread for a common gotcha: http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html , it appears you can just comment out the one in the section. Best Erick On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] < xiao...@mail.nlm.nih.gov> wrote: > I need index and search some pdf files which are very big (around 1000 > pages each). How can I set maxFieldLength to unlimited? > > Thanks so much for your help in advance, > Xiaohui >
Re: how to set maxFieldLength to unlimitd
Set the value in solrconfig.xml to, say, 2147483647 Also, see this thread for a common gotcha: http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html , it appears you can just comment out the one in the section. Best Erick On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] < xiao...@mail.nlm.nih.gov> wrote: > I need index and search some pdf files which are very big (around 1000 > pages each). How can I set maxFieldLength to unlimited? > > Thanks so much for your help in advance, > Xiaohui >
how to set maxFieldLength to unlimitd
I need index and search some pdf files which are very big (around 1000 pages each). How can I set maxFieldLength to unlimited? Thanks so much for your help in advance, Xiaohui