Re: Solr Indexing MAX FILE LIMIT

2012-11-15 Thread Alexandre Rafalovitch
Maybe you can start by testing this with split -l and xargs :-) These are
standard Unix toolkit approaches and since you use one of them (curl) you
may be happy to use others too.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Nov 14, 2012 at 11:33 PM, mitra mitra.re...@ornext.com wrote:

 Thank you eric

 I didnt know that we could write a Java class for it , can you provide me
 with some info on how to

 Thanks



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4020407.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Indexing MAX FILE LIMIT

2012-11-14 Thread mitra
Thank you eric

I didnt know that we could write a Java class for it , can you provide me
with some info on how to 

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4020407.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Indexing MAX FILE LIMIT

2012-11-13 Thread Markus Jelsma
Hi - instead of trying to make the system ingest such large files perhaps you 
can split the files in many small pieces. 
 
-Original message-
 From:mitra mitra.re...@ornext.com
 Sent: Tue 13-Nov-2012 09:05
 To: solr-user@lucene.apache.org
 Subject: Solr Indexing MAX FILE LIMIT
 
  Hello Guys
 
 Im using Apache solr 3.6.1 on tomcat 7 for indexing csv files using curl on
 windows machine
 
 ** My question is that what would be the max csv file size limit when doing
 a HTTP POST or while using the following curl command
 curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv -F
 commit=true -F optimize=true -F encapsulate= -F keepEmpty=true
 
 ** My requirement is quite large because we have to index CSV files ranging
 between 8 to 10 GB
 
 ** What would be the optimum settings for index parameters like commit for
 better perfomance on a machine with 8gb RAM
 
 Please guide me on it
 
 Thanks in Advance
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


RE: Solr Indexing MAX FILE LIMIT

2012-11-13 Thread mitra
Thankyou


*** I understand that the default size for HTTP POST in tomcat is 2mb can we
change that somehow
   so that i dont need to split the 10gb csv into 2mb chunks

curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv -F
commit=true -F optimize=true -F encapsulate= -F keepEmpty=true 

*** As I mentioned im using the above command to post rather than using this
below format

curl http://localhost:8080/solr/update/csv --data-binary @eighth.csv -H
'Content-type:text/plain; charset=utf-8'

***My question Is the Limit still applicable even when not using the above
data binary format also




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4019965.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Indexing MAX FILE LIMIT

2012-11-13 Thread Erick Erickson
Have you considered writing a small SolrJ (or other client) program that
processed the rows in your huge file and sent them to solr in sensible
chunks? That would give you much finer control over how the file was
processed, how many docs were sent to Solr at a time, what to do with
errors. You could even run N simultaneous programs to increase throughput...

FWIW,
Erick


On Tue, Nov 13, 2012 at 3:42 AM, mitra mitra.re...@ornext.com wrote:

 Thankyou


 *** I understand that the default size for HTTP POST in tomcat is 2mb can
 we
 change that somehow
so that i dont need to split the 10gb csv into 2mb chunks

 curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv
 -F
 commit=true -F optimize=true -F encapsulate= -F keepEmpty=true

 *** As I mentioned im using the above command to post rather than using
 this
 below format

 curl http://localhost:8080/solr/update/csv --data-binary @eighth.csv -H
 'Content-type:text/plain; charset=utf-8'

 ***My question Is the Limit still applicable even when not using the above
 data binary format also




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4019965.html
 Sent from the Solr - User mailing list archive at Nabble.com.