RE: SOLR - Unable to execute query error - DIH

2013-03-28 Thread kobe.free.wo...@gmail.com
Thanks James.

We have tried the following options *(individually)* including the one you
suggested,

1.selectMethod=cursor 
2. batchSize=-1
3.responseBuffering=adaptive

But the indexing process doesn't seem to be improving at all. When we try to
index set of 500 rows it works well gets completed in 18 min. For 1000K rows
it took 22 hours (long) for indexing. But, when we try to index the complete
set of 750K rows it doesn't show any progress and keeps on executing.

Currently both the SQL server as well as the SOLR machine is running on 4 GB
RAM. With this configuration does the above scenario stands justified? If we
think of upgrading the RAM, which machine should that be, the SOLR machine
or the SQL Server machine?

Are there any other efficient methods to import/ index data from SQL Server
to SOLR?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051981.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR - Unable to execute query error - DIH

2013-03-28 Thread Dyer, James
You may want to run your jdbc driver in trace mode just to see if it is picking 
up these different options.  I know from experience that the selectMethod 
parameter can sometimes be important to prevent SQLServer drivers from caching 
the entire resultset in memory.  

But something seems very wrong here and maybe driver tuning is really not what 
you need.  18 minutes to index 500 documents is extreme.  Unless the documents 
were huge or you were doing very unusual, I'd expect this to happen in seconds 
(1 second?).  Are you indexing on a Raspberry Pi?

Possibly, you have a cartesian join somewhere in your sql, or some other little 
mistake?  If you post your entire data-config.xml possibly someone will see the 
error.  Or, could you be extremely memory constrained because of bad JVM heap 
choices?  Do your logs show you the jvm constantly in GC cycles?

Just a little note:  batchSize goes on the dataSource / tag, not on document 
/.  I really don't think tweaking batchSize is going to fix this though.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] 
Sent: Thursday, March 28, 2013 1:43 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR - Unable to execute query error - DIH

Thanks James.

We have tried the following options *(individually)* including the one you
suggested,

1.selectMethod=cursor 
2. batchSize=-1
3.responseBuffering=adaptive

But the indexing process doesn't seem to be improving at all. When we try to
index set of 500 rows it works well gets completed in 18 min. For 1000K rows
it took 22 hours (long) for indexing. But, when we try to index the complete
set of 750K rows it doesn't show any progress and keeps on executing.

Currently both the SQL server as well as the SOLR machine is running on 4 GB
RAM. With this configuration does the above scenario stands justified? If we
think of upgrading the RAM, which machine should that be, the SOLR machine
or the SQL Server machine?

Are there any other efficient methods to import/ index data from SQL Server
to SOLR?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051981.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: SOLR - Unable to execute query error - DIH

2013-03-28 Thread Swati Swoboda
What version of Solr4 are you running? We are on 3.6.2 so I can't be confident 
whether these settings still exist (they probably do...), but here is what we 
do to speed up full-indexing:

In solrconfig.xml, increase your ramBufferSize to 128MB.
Increase mergeFactor to 20.
Make sure autoCommit is disabled.

Basically, you want to minimize how often Lucene/Solr flushes (as that is very 
time consuming). Merging is also very time consuming, so you want large 
segments and fewer merges (hence the merge factor increase). We use these 
settings when we are doing our initial full-indexing and then switch them over 
to saner defaults do our regular/delta indexing.

Roll-backs concern me; why did your query roll back? Did it give an error -- it 
should have. Should be in your solr log file. Was it because the connection 
timed out? It's important to find out. We prevented roll backs by effectively 
splitting our data across entities and then indexing one-entity at a time. This 
allowed us to make sure that if one sector failed, it didn't impact the 
entire process. (This can be done by using autoCommit, but that slows down 
indexing.) 

If you're getting OOM errors, be sure that your Xmx value is set high enough 
(and that you have enough memory). You may be able to increase ramBufferSize 
depending on how much memory you had (we didn't have much). 

Hope this helps.
Swati


-Original Message-
From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] 
Sent: Thursday, March 28, 2013 2:43 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR - Unable to execute query error - DIH

Thanks James.

We have tried the following options *(individually)* including the one you 
suggested,

1.selectMethod=cursor 
2. batchSize=-1
3.responseBuffering=adaptive

But the indexing process doesn't seem to be improving at all. When we try to 
index set of 500 rows it works well gets completed in 18 min. For 1000K rows it 
took 22 hours (long) for indexing. But, when we try to index the complete set 
of 750K rows it doesn't show any progress and keeps on executing.

Currently both the SQL server as well as the SOLR machine is running on 4 GB 
RAM. With this configuration does the above scenario stands justified? If we 
think of upgrading the RAM, which machine should that be, the SOLR machine or 
the SQL Server machine?

Are there any other efficient methods to import/ index data from SQL Server to 
SOLR?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051981.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR - Unable to execute query error - DIH

2013-03-28 Thread Chris Hostetter

: I am trying to index data from SQL Server view to the SOLR using the DIH

Have you ruled out the view itself being the bottle neck?

Try running whatever command line SQLServer client exists on your SOLR 
server to connect remotely to your existing SQL server and run select * 
from view and redirect thek output to a file.

that will give you a minimal absolute baseline for the best possible 
performace you could expect to hope for when indexing into Solr -- and tip 
you off to wether the view is the problem when asking for more then a 
handful of documents.



-Hoss


Re: SOLR - Unable to execute query error - DIH

2013-03-25 Thread kobe.free.wo...@gmail.com
In context of the above scenario, when i try to index set of 500 rows, it
fetches and indexes around 400 odd rows and then it shows no progress and
keeps on executing. What can be the possible cause of this issue? If
possible, please do share if you guys have gone through such scenario with
the respective details.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051034.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR - Unable to execute query error - DIH

2013-03-25 Thread Dyer, James
With MS SqlServer, try adding selectMethod=cursor to your conenction string 
and set your batch size to a reasonable amount (possibly just omit it and DIH 
has a default value it will use.)

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] 
Sent: Monday, March 25, 2013 3:25 AM
To: solr-user@lucene.apache.org
Subject: SOLR - Unable to execute query error - DIH

Hello All,

I am trying to index data from SQL Server view to the SOLR using the DIH
with full-import command. The view has 750K rows and 427 columns. During the
first execution i indexed only the first 50 rows of the view, the data got
indexed in 10 min. But, when i executed the same scenario to index the
complete set of 750K rows, the execution continued for 2 days and
roll-backed, giving me the following error:

Unable to execute the query: select * from.

Following is my DIH configuration file,

dataConfig
  dataSource type=JdbcDataSource
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://server1\sql2012;databaseName=DBName user=x
password=x /
  document name=Search batchsize=1
entity name=Search query=select top 500 * from view
   field column=ID name=Id /

As suggested in some of the posts, i did try with batchsize=-1, but dint
work out. Please suggest is this the correct approach or any parameter needs
to be modified for tuning.

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028.html
Sent from the Solr - User mailing list archive at Nabble.com.