[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-05-07 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269758#comment-13269758
 ] 

Mikhail Khludnev commented on SOLR-3360:


James,

I'm looking into but how /where=xid=x.id/ is possible with  
processor=SqlEntityProcessor ? It seems to me that your version of ternary 
operator is not so restrictive as mine. 

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
Assignee: James Dyer
 Fix For: 3.6.1

 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360-test.patch, SOLR-3360-test.patch, SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-05-07 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269925#comment-13269925
 ] 

Mikhail Khludnev commented on SOLR-3360:


James,

I can't reproduce the failure. 
mkhl$ ant test-contrib 
-Dtests.seed=-55eeb72d0a16dfec:4e1a59f5738a6b25:4bf3cbf2bd3b659a 
-Dtestcase=TestThreaded

junit report 

{code}
property name=tests.seed 
value=-55eeb72d0a16dfec:4e1a59f5738a6b25:4bf3cbf2bd3b659a /

  testcase classname=org.apache.solr.handler.dataimport.TestThreaded 
name=testCachedThread_FullImport time=0.965 /
  testcase classname=org.apache.solr.handler.dataimport.TestThreaded 
name=testCachedThreadless_FullImport time=0.055 /
  testcase classname=org.apache.solr.handler.dataimport.TestThreaded 
name=testCachedSingleThread_FullImport time=0.053 /

{code}


Finally I added _Total test which enumerates all test params 
https://github.com/m-khl/solr-patches/commit/0532e653a3319247519f90bd8987c84171ac6a56.diff

at core i5 MacBook Pro

:solr mkhl$ ant test-contrib -Dtestcase=TestThreaded
junit-sequential:
[junit] Testsuite: org.apache.solr.handler.dataimport.TestThreaded
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 3.899 sec
[junit] 

junit-parallel:

{code}
 /properties
  testcase classname=org.apache.solr.handler.dataimport.TestThreaded 
name=testCachedThread_FullImport time=1.042 /
  testcase classname=org.apache.solr.handler.dataimport.TestThreaded 
name=testCachedThreadless_FullImport time=0.047 /
  testcase classname=org.apache.solr.handler.dataimport.TestThreaded 
name=testCachedSingleThread_FullImport time=0.052 /
  testcase classname=org.apache.solr.handler.dataimport.TestThreaded 
name=testCachedThread_Total time=0.898 /
  system-out![CDATA[]]/system-out
{code}


Please give me a clue how to reproduce the failure. What do you use IDE or 
script? Have you clean before test? Pls show me exact command, junit report, 
log/output, etc

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
Assignee: James Dyer
 Fix For: 3.6.1

 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360-test.patch, SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360-test.patch, SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria 

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-05-04 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268557#comment-13268557
 ] 

Mikhail Khludnev commented on SOLR-3360:


TestThreaded in a patch has the following significant additions:

testNPlusOneThreadless_FullImport()
testNPlusOneSingleThread_FullImport() 
testNPlusOneTenThreads_FullImport()

these guys use separate dataconfig called *dataConfigNPulsOne* which has  
where y.A=${x.id} as wells as the topic starter's config. Current tests covers 
only  where=\xid=x.id\ in-mem join scenario.

Also all data provided for MockDataSource are wrapped by  new Once(parentRow) 
that enforces the verification of the subject issue - halting or query 
repeating problem. 

Regards

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
Assignee: James Dyer
 Fix For: 3.6.1

 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-24 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260351#comment-13260351
 ] 

Mikhail Khludnev commented on SOLR-3360:


James,

I checked the commit 
http://svn.apache.org/viewvc?view=revisionrevision=1329444 code changes looks 
ok. But I insist on comitting TestThreaded too. It asserts quite important N+1 
cases and Once-Datasource semantic, pls have a look essential but not last test 
coverage improvements 
https://github.com/m-khl/solr-patches/commit/0a98a827a2df6373ed7a227a240c822e2c150486#solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestThreaded.java
 . Patch already has these changes for TestThreaded.java. Do you like me to 
raise separate issue for improving test coverage or you want me to polish them 
somehow? 
  

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
Assignee: James Dyer
 Fix For: 3.6.1

 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent 

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-23 Thread Claudio R (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259618#comment-13259618
 ] 

Claudio R commented on SOLR-3360:
-

Hi Mikhail,

I didnĀ“t apply the SOLR-3360.path. The my test was over version 3.6.0 final
Which svn revision should I apply the patch?

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-23 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259843#comment-13259843
 ] 

Mikhail Khludnev commented on SOLR-3360:


Claudio,
patched sources are https://github.com/m-khl/solr-patches/tree/solr3360
patched jar is 
https://github.com/downloads/m-khl/solr-patches/apache-solr-dataimporthandler-3.6.1-SNAPSHOT.jar
I work with http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_6/

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-23 Thread Claudio R (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259868#comment-13259868
 ] 

Claudio R commented on SOLR-3360:
-

Hi Mikhail,

I did checkout from: 
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_6_0, applied the 
SOLR-3360.patch and compiled the dataimporthandler. The code worked correctly. 
In the log there is no repetition of queries as before.

I got:

lst name=statusMessages
   str name=Total Requests made to DataSource0/str
   str name=Total Rows Fetched1000/str
   str name=Total Documents Skipped0/str
   str name=Full Dump Started2012-04-23 15:56:48/str
   str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
documents./str
   str name=Committed2012-04-23 15:57:29/str
   str name=Total Documents Processed1000/str
   str name=Time taken0:0:41.390/str
/lst

The time spent decreased. 
Before, with 1 thread I had obtained 0:1:22.390
Now, with 10 threads I obtain 0:0:41.390

Great job Mikhail.
Thank you very much.
Will this fix be present in version 3.6.1?



 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field 

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-22 Thread Claudio R (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259333#comment-13259333
 ] 

Claudio R commented on SOLR-3360:
-

Hi Mikhail and James,

When I ran the test with only the root entity (without sub-entities) the 
problem also occurs. This problem does not appear to be related only to the 
sub-entities (N+1 case).

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-22 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259388#comment-13259388
 ] 

Mikhail Khludnev commented on SOLR-3360:


Claudio,

It's not clear what you did. Have you applied SOLR-3360.patch attached? 

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-21 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258851#comment-13258851
 ] 

Mikhail Khludnev commented on SOLR-3360:


James, 
I checked what was committed in SOLR-3011. The problem is that all N+1 cases ( 
entity name=\y\ query=\select * from y where y.A=${x.id}\\n) were 
dropped off from my patch for TestThreaded.java from 16th Mar. After they were 
not covered anymore, I suppose that halting problem (child entities selected 
again and again) were introduced by fix SOLR-3307 (shame on me I was out of the 
loop). My plan is bring N+1 cases back in TestThreaded.java, and provide 
correct fix for SOLR-3307. It's just first feeling. The worst case is SOLR-3307 
can conflict with halting problem.   

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R

 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

 

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-21 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258940#comment-13258940
 ] 

Mikhail Khludnev commented on SOLR-3360:


Ok.

I picked up old test 
https://raw.github.com/m-khl/solr-patches/solr3011/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestThreaded.java

for testNPlusOneTenThreads_FullImport I have 
21-04-2012 14:55:39 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {deleteByQuery=*:*,add=[3, 2, 4, 2, 1, 1, 2, 2, ... (40 adds)],commit=} 0 
2086
which is what issue is about. 

single thread is fine
testNPlusOneSingleThread_FullImport()
21.04.2012 13:08:21 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {deleteByQuery=*:*,add=[2, 3, 4, 1],commit=} 0 1289

so, this test can reproduce the problem but not actually test it, I need to 
make it more restrictive. 





 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R

 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-16 Thread James Dyer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254765#comment-13254765
 ] 

James Dyer commented on SOLR-3360:
--

Claudio,

Thanks for reporting this.  Was this working prior with 3.5?  (We did some work 
with the threads feature in 3.6, so it'd be helpful to know if this is a new 
bug).  

Also, can you try it (1) without any transformers and (2) with just the parent 
entity (take out the sub-entities).  Do you get 10,000 or 1,000 ?  This might 
help in diagnosing any maybe solving this problem.

Finally, you may want to be aware that 3.6 is the last release that will 
support the DIH threads feature.  It simply had too many bugs and was too 
difficult to maintain to keep it in.  But we did try and fix as many bugs for 
3.6 as we could.  Possibly in fixing what we could, we introduced this as a 
new problem?

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R

 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-16 Thread Claudio R (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254855#comment-13254855
 ] 

Claudio R commented on SOLR-3360:
-

Hi James,

About the version 3.5.0, I got unstable behavior with 10 threads. In first 
full-import, I got successful import:

lst name=statusMessages
   str name=Total Requests made to DataSource0/str
   str name=Total Rows Fetched1000/str
   str name=Total Documents Skipped0/str
   str name=Full Dump Started2012-04-16 14:12:08/str
   str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
documents./str
   str name=Committed2012-04-16 14:13:21/str
   str name=Optimized2012-04-16 14:13:21/str
   str name=Total Documents Processed1000/str
   str name=Time taken 0:1:12.875/str
/lst

But, in second, third full-import I got Indexing failed. Rolled back all 
changes.

lst name=statusMessages
   str name=Time Elapsed0:0:6.906/str
   str name=Total Requests made to DataSource0/str
   str name=Total Rows Fetched12/str
   str name=Total Documents Processed11/str
   str name=Total Documents Skipped0/str
   str name=Full Dump Started2012-04-16 14:15:38/str
   str name=Indexing failed. Rolled back all changes./str
   str name=Rolledback2012-04-16 14:15:43/str
/lst

At catalina.out, I got:

SEVERE: Full Import failed:java.lang.RuntimeException: Error in multi-threaded 
import
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:265)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to execute query: select categoria, sub_categoria from filtro where 
indice_id_indice = '257346'
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper.nextRow(ThreadedEntityProcessorWrapper.java:84)
at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:446)
at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:399)
at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:466)
at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.access$000(DocBuilder.java:353)
at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner$1.run(DocBuilder.java:406)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: socket closed
at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)
at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1355)
at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1532)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:3274)
at 
com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4433)
at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:784)
at 
com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:685)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4026)
at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1416)
at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:185)
at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:160)
at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.execute(SQLServerStatement.java:658)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246)
... 13 more

In version 3.6.0 I did not get unstable behavior as obtained in 

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-16 Thread Claudio R (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254875#comment-13254875
 ] 

Claudio R commented on SOLR-3360:
-

I ran in version 3.6.0 with 20 threads and got 2 documents processed:

lst name=statusMessages
   str name=Total Requests made to DataSource0/str
   str name=Total Rows Fetched2/str
   str name=Total Documents Skipped0/str
   str name=Full Dump Started2012-04-16 15:10:22/str
   str name=Indexing completed. Added/Updated: 2 documents. Deleted 0 
documents./str
   str name=Committed2012-04-16 15:24:04/str
   str name=Total Documents Processed2/str
   str name=Time taken0:13:42.110/str
/lst

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R

 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To 

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-16 Thread Mikhail Khludnev (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254919#comment-13254919
 ] 

Mikhail Khludnev commented on SOLR-3360:


Claudio,

Thank you for providing feedback. I have several considerations for your issue:

# to be honest I didn't pay much attention to these counter when fixing 
threads, I didn't assert it. So, it might be a bug with counters. But the main 
subject is your index is it correct? Does it has expected number of docs? Are 
all master entities were properly connected to the details ones? Pls let us 
know your observations.

# even DIH code would be correct, you add too many threads. The reason of 
adding threads is get high CPU utilization, if you exceeds your IO limits you 
waste CPU time for contentions. Could you start from 2? 

# I suppose significant time were spend for obtaining JDBC connections, btw how 
many of them are avalable in parallel? If you are not happy how DIH scales you 
can check what does it spent time for. Logs with debug level for DIH enabled 
are appreciated. You also can take sampling by jconsole, or even manually run 
jstack JVMPID

Thanks

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R

 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-16 Thread James Dyer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254940#comment-13254940
 ] 

James Dyer commented on SOLR-3360:
--

I think we need to verify whether or not it is adding the same 1000 documents 
10x, or if its just counting each document 10x.  The fact that the successful 
10-thread 3.5 run took 1:12 but that same 10-thread run on 3.6 took 14:15 makes 
me wonder if each thread is actually duplicating the work and not just doing 
extra counting?

But then again the successful ONE-thread 3.6 run took 1:12 also... hmm...

Probably we need a unit test that does a simple SQL import with 2 threads and 
counts how many times SolrWriter#upload got called, then compares it both with 
the # of docs sent and the # docs reported to the user.  Then we'd know what is 
actually broken.  It'd be interesting to see what that same test against 3.5 
does (if it can be made to run to completion).  Possibly this is broken in 3.5 
too (except the counters) but nobody noticed because they always got 
synchronization problems and gave up??

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R

 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity

[jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-16 Thread Claudio R (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255044#comment-13255044
 ] 

Claudio R commented on SOLR-3360:
-

Hi James Dyer and Mikhail Khludnev,

I added in logging.properties of tomcat the line below:

org.apache.solr.handler.dataimport.JdbcDataSource.level=FINE

And ran again in 3.6.0 with 10 threads.
The select below was performed 10 times

select url from video where indice_id_indice = '257933'

This select of sub-entity should have been executed only one time.



 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R

 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org