RE: DIH import and postImportDeleteQuery

2011-05-25 Thread Dyer, James
Great.  I wasn't aware of the other issue.  I put a link on the 2 issues in 
JIRA so people can know in the future.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alexandre Rocco [mailto:alel...@gmail.com] 
Sent: Wednesday, May 25, 2011 2:34 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH import and postImportDeleteQuery

Hi James,

Thanks for the heads up!
I am currently on version 1.4.1, so I can apply this patch and see if it
works.
Just need to assess if it's best to apply the patch or to check on the
backend system to see if only delete requests were generated and then do not
call DIH.

Previously, I found another open issue, created from Ephraim:
https://issues.apache.org/jira/browse/SOLR-2104

It's the same issue, but it hasn't had any updates yet.

Regards,
Alexandre

On Wed, May 25, 2011 at 3:17 PM, Dyer, James wrote:

> The "failure to commit" bug with $deleteDocById can be fixed by applying
> patch SOLR-2492.  This patch also partially fixes the "no updated stats" bug
> in that it increments 1 for every call to $deleteDocById and
> $deleteDocByQuery.  Note that this might result in inaccurate counts if the
> id given with $deleteDocById doesn't exist or is duplicated.  Obviously this
> is not a complete fix for stats using $deleteDocByQuery as this command
> would normally be used to delete >1 doc at a time.
>
> The patch is for Trunk but it might work with 3.1 also.  If not, it likely
> only needs minor tweaking.
>
> The jira ticket is here:  https://issues.apache.org/jira/browse/SOLR-2492
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Alexandre Rocco [mailto:alel...@gmail.com]
> Sent: Wednesday, May 25, 2011 12:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH import and postImportDeleteQuery
>
> Hi Ephraim,
>
> Thank you so much for the input.
> I was able to find your thread on the archives and got your solution to
> work.
>
> In fact, when using $deleteDocById and $skipDoc it worked like a charm.
> This
> feature is very useful, it's a shame it's not properly documented.
> The only downside is the one you mentioned that the stats are not updated,
> so if I update 13 documents and delete 2, DIH would tell me that only 13
> documents were processed. This is bad in my case because I check the end
> result to generate an error e-mail if needed.
>
> You also mentioned that if the query contains only deletion records, a
> commit would not be automatically executed and it would be necessary to
> commit manually.
>
> How can I commit manually via DIH? I was not able to find any references on
> the documentation.
>
> Thanks!
> Alexandre
>
> On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir  wrote:
>
> > Search the list for my post "DIH - deleting documents, high performance
> > (delta) imports, and passing parameters" which shows my solution a
> > similar problem.
> >
> > Ephraim Ofir
> >
> > -Original Message-
> > From: Alexandre Rocco [mailto:alel...@gmail.com]
> > Sent: Tuesday, May 24, 2011 11:24 PM
> > To: solr-user@lucene.apache.org
> > Subject: DIH import and postImportDeleteQuery
> >
> > Guys,
> >
> > I am facing a situation in one of our projects that I need to perform a
> > cleanup to remove some documents after we perform an update via DIH.
> > The big issue right now comes from the fact that when we call the DIH
> > with
> > clean=false, the postImportDeleteQuery is not executed.
> >
> > My setup is currently arranged like this:
> > - A SQL Server stored procedure that receives a parameter (specified in
> > the
> > URL) and returns the records to be indexed
> > - The procedure is able to return all the records (for a full-import) or
> > only the updated records (for a delta-import)
> > - This procedure returns valid and deleted records, from this point
> > comes
> > the need to run a postImportDeleteQuery to remove the deleted ones.
> >
> > Everything works fine when I run a full-import, I am running always with
> > clean=true, and then the whole index is rebuilt.
> > When I need to do an incremental update, the records are updated
> > correctly,
> > but the command to delete the other records is not executed.
> >
> > I've tried several combinations, with different results:
> > - Running full-import with clean=false: the records are updated but the
> > ones
> > that needs to be deleted stays on the index
> > - Running delta-import with clean=false: the records are updated but th

Re: DIH import and postImportDeleteQuery

2011-05-25 Thread Alexandre Rocco
Hi James,

Thanks for the heads up!
I am currently on version 1.4.1, so I can apply this patch and see if it
works.
Just need to assess if it's best to apply the patch or to check on the
backend system to see if only delete requests were generated and then do not
call DIH.

Previously, I found another open issue, created from Ephraim:
https://issues.apache.org/jira/browse/SOLR-2104

It's the same issue, but it hasn't had any updates yet.

Regards,
Alexandre

On Wed, May 25, 2011 at 3:17 PM, Dyer, James wrote:

> The "failure to commit" bug with $deleteDocById can be fixed by applying
> patch SOLR-2492.  This patch also partially fixes the "no updated stats" bug
> in that it increments 1 for every call to $deleteDocById and
> $deleteDocByQuery.  Note that this might result in inaccurate counts if the
> id given with $deleteDocById doesn't exist or is duplicated.  Obviously this
> is not a complete fix for stats using $deleteDocByQuery as this command
> would normally be used to delete >1 doc at a time.
>
> The patch is for Trunk but it might work with 3.1 also.  If not, it likely
> only needs minor tweaking.
>
> The jira ticket is here:  https://issues.apache.org/jira/browse/SOLR-2492
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Alexandre Rocco [mailto:alel...@gmail.com]
> Sent: Wednesday, May 25, 2011 12:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH import and postImportDeleteQuery
>
> Hi Ephraim,
>
> Thank you so much for the input.
> I was able to find your thread on the archives and got your solution to
> work.
>
> In fact, when using $deleteDocById and $skipDoc it worked like a charm.
> This
> feature is very useful, it's a shame it's not properly documented.
> The only downside is the one you mentioned that the stats are not updated,
> so if I update 13 documents and delete 2, DIH would tell me that only 13
> documents were processed. This is bad in my case because I check the end
> result to generate an error e-mail if needed.
>
> You also mentioned that if the query contains only deletion records, a
> commit would not be automatically executed and it would be necessary to
> commit manually.
>
> How can I commit manually via DIH? I was not able to find any references on
> the documentation.
>
> Thanks!
> Alexandre
>
> On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir  wrote:
>
> > Search the list for my post "DIH - deleting documents, high performance
> > (delta) imports, and passing parameters" which shows my solution a
> > similar problem.
> >
> > Ephraim Ofir
> >
> > -Original Message-
> > From: Alexandre Rocco [mailto:alel...@gmail.com]
> > Sent: Tuesday, May 24, 2011 11:24 PM
> > To: solr-user@lucene.apache.org
> > Subject: DIH import and postImportDeleteQuery
> >
> > Guys,
> >
> > I am facing a situation in one of our projects that I need to perform a
> > cleanup to remove some documents after we perform an update via DIH.
> > The big issue right now comes from the fact that when we call the DIH
> > with
> > clean=false, the postImportDeleteQuery is not executed.
> >
> > My setup is currently arranged like this:
> > - A SQL Server stored procedure that receives a parameter (specified in
> > the
> > URL) and returns the records to be indexed
> > - The procedure is able to return all the records (for a full-import) or
> > only the updated records (for a delta-import)
> > - This procedure returns valid and deleted records, from this point
> > comes
> > the need to run a postImportDeleteQuery to remove the deleted ones.
> >
> > Everything works fine when I run a full-import, I am running always with
> > clean=true, and then the whole index is rebuilt.
> > When I need to do an incremental update, the records are updated
> > correctly,
> > but the command to delete the other records is not executed.
> >
> > I've tried several combinations, with different results:
> > - Running full-import with clean=false: the records are updated but the
> > ones
> > that needs to be deleted stays on the index
> > - Running delta-import with clean=false: the records are updated but the
> > ones that needs to be deleted stays on the index
> > - Running delta-import with clean=true: all records are deleted from the
> > index and then only the records returned by the procedure are on the
> > index,
> > except the deleted ones.
> >
> > I don't see any way to achieve my goal, without changing the process
>

RE: DIH import and postImportDeleteQuery

2011-05-25 Thread Dyer, James
The "failure to commit" bug with $deleteDocById can be fixed by applying patch 
SOLR-2492.  This patch also partially fixes the "no updated stats" bug in that 
it increments 1 for every call to $deleteDocById and $deleteDocByQuery.  Note 
that this might result in inaccurate counts if the id given with $deleteDocById 
doesn't exist or is duplicated.  Obviously this is not a complete fix for stats 
using $deleteDocByQuery as this command would normally be used to delete >1 doc 
at a time.

The patch is for Trunk but it might work with 3.1 also.  If not, it likely only 
needs minor tweaking.  

The jira ticket is here:  https://issues.apache.org/jira/browse/SOLR-2492

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alexandre Rocco [mailto:alel...@gmail.com] 
Sent: Wednesday, May 25, 2011 12:54 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH import and postImportDeleteQuery

Hi Ephraim,

Thank you so much for the input.
I was able to find your thread on the archives and got your solution to
work.

In fact, when using $deleteDocById and $skipDoc it worked like a charm. This
feature is very useful, it's a shame it's not properly documented.
The only downside is the one you mentioned that the stats are not updated,
so if I update 13 documents and delete 2, DIH would tell me that only 13
documents were processed. This is bad in my case because I check the end
result to generate an error e-mail if needed.

You also mentioned that if the query contains only deletion records, a
commit would not be automatically executed and it would be necessary to
commit manually.

How can I commit manually via DIH? I was not able to find any references on
the documentation.

Thanks!
Alexandre

On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir  wrote:

> Search the list for my post "DIH - deleting documents, high performance
> (delta) imports, and passing parameters" which shows my solution a
> similar problem.
>
> Ephraim Ofir
>
> -Original Message-
> From: Alexandre Rocco [mailto:alel...@gmail.com]
> Sent: Tuesday, May 24, 2011 11:24 PM
> To: solr-user@lucene.apache.org
> Subject: DIH import and postImportDeleteQuery
>
> Guys,
>
> I am facing a situation in one of our projects that I need to perform a
> cleanup to remove some documents after we perform an update via DIH.
> The big issue right now comes from the fact that when we call the DIH
> with
> clean=false, the postImportDeleteQuery is not executed.
>
> My setup is currently arranged like this:
> - A SQL Server stored procedure that receives a parameter (specified in
> the
> URL) and returns the records to be indexed
> - The procedure is able to return all the records (for a full-import) or
> only the updated records (for a delta-import)
> - This procedure returns valid and deleted records, from this point
> comes
> the need to run a postImportDeleteQuery to remove the deleted ones.
>
> Everything works fine when I run a full-import, I am running always with
> clean=true, and then the whole index is rebuilt.
> When I need to do an incremental update, the records are updated
> correctly,
> but the command to delete the other records is not executed.
>
> I've tried several combinations, with different results:
> - Running full-import with clean=false: the records are updated but the
> ones
> that needs to be deleted stays on the index
> - Running delta-import with clean=false: the records are updated but the
> ones that needs to be deleted stays on the index
> - Running delta-import with clean=true: all records are deleted from the
> index and then only the records returned by the procedure are on the
> index,
> except the deleted ones.
>
> I don't see any way to achieve my goal, without changing the process
> that I
> do to obtain the data.
> Since this is a very complex stored procedure, with tons of joins and
> custom
> processing, I am trying everything to avoid messing with it.
>
> See below a copy of my data-config.xml file. I made it simpler omitting
> all
> the fields, since it's out of scope of the issue:
> 
> 
>  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> url="jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
> password;responseBuffering=adaptive;"
>
> />
> 
>  pk="entityid"
> transformer="RegexTransformer"
> query="EXEC some_stored_procedure ${dataimporter.request.someid}"
> preImportDeleteQuery="status:1" postImportDeleteQuery="status:1"
> >
> 
> 
> 
> 
>
>  pk="entityid"
> transformer="RegexTransformer"
> query="EXEC someother_stored_procedure
> ${dataimporter.request.someotherid}"
> preImportDeleteQuery="status:1" postImportDeleteQuery="status:1"
> >
> 
> 
> 
> 
> 
> 
>
> Any ideas or pointers that might help on this one?
>
> Many thanks,
> Alexandre
>


Re: DIH import and postImportDeleteQuery

2011-05-25 Thread Alexandre Rocco
Hi Ephraim,

Thank you so much for the input.
I was able to find your thread on the archives and got your solution to
work.

In fact, when using $deleteDocById and $skipDoc it worked like a charm. This
feature is very useful, it's a shame it's not properly documented.
The only downside is the one you mentioned that the stats are not updated,
so if I update 13 documents and delete 2, DIH would tell me that only 13
documents were processed. This is bad in my case because I check the end
result to generate an error e-mail if needed.

You also mentioned that if the query contains only deletion records, a
commit would not be automatically executed and it would be necessary to
commit manually.

How can I commit manually via DIH? I was not able to find any references on
the documentation.

Thanks!
Alexandre

On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir  wrote:

> Search the list for my post "DIH - deleting documents, high performance
> (delta) imports, and passing parameters" which shows my solution a
> similar problem.
>
> Ephraim Ofir
>
> -Original Message-
> From: Alexandre Rocco [mailto:alel...@gmail.com]
> Sent: Tuesday, May 24, 2011 11:24 PM
> To: solr-user@lucene.apache.org
> Subject: DIH import and postImportDeleteQuery
>
> Guys,
>
> I am facing a situation in one of our projects that I need to perform a
> cleanup to remove some documents after we perform an update via DIH.
> The big issue right now comes from the fact that when we call the DIH
> with
> clean=false, the postImportDeleteQuery is not executed.
>
> My setup is currently arranged like this:
> - A SQL Server stored procedure that receives a parameter (specified in
> the
> URL) and returns the records to be indexed
> - The procedure is able to return all the records (for a full-import) or
> only the updated records (for a delta-import)
> - This procedure returns valid and deleted records, from this point
> comes
> the need to run a postImportDeleteQuery to remove the deleted ones.
>
> Everything works fine when I run a full-import, I am running always with
> clean=true, and then the whole index is rebuilt.
> When I need to do an incremental update, the records are updated
> correctly,
> but the command to delete the other records is not executed.
>
> I've tried several combinations, with different results:
> - Running full-import with clean=false: the records are updated but the
> ones
> that needs to be deleted stays on the index
> - Running delta-import with clean=false: the records are updated but the
> ones that needs to be deleted stays on the index
> - Running delta-import with clean=true: all records are deleted from the
> index and then only the records returned by the procedure are on the
> index,
> except the deleted ones.
>
> I don't see any way to achieve my goal, without changing the process
> that I
> do to obtain the data.
> Since this is a very complex stored procedure, with tons of joins and
> custom
> processing, I am trying everything to avoid messing with it.
>
> See below a copy of my data-config.xml file. I made it simpler omitting
> all
> the fields, since it's out of scope of the issue:
> 
> 
>  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> url="jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
> password;responseBuffering=adaptive;"
>
> />
> 
>  pk="entityid"
> transformer="RegexTransformer"
> query="EXEC some_stored_procedure ${dataimporter.request.someid}"
> preImportDeleteQuery="status:1" postImportDeleteQuery="status:1"
> >
> 
> 
> 
> 
>
>  pk="entityid"
> transformer="RegexTransformer"
> query="EXEC someother_stored_procedure
> ${dataimporter.request.someotherid}"
> preImportDeleteQuery="status:1" postImportDeleteQuery="status:1"
> >
> 
> 
> 
> 
> 
> 
>
> Any ideas or pointers that might help on this one?
>
> Many thanks,
> Alexandre
>


RE: DIH import and postImportDeleteQuery

2011-05-25 Thread Ephraim Ofir
Search the list for my post "DIH - deleting documents, high performance
(delta) imports, and passing parameters" which shows my solution a
similar problem.

Ephraim Ofir

-Original Message-
From: Alexandre Rocco [mailto:alel...@gmail.com] 
Sent: Tuesday, May 24, 2011 11:24 PM
To: solr-user@lucene.apache.org
Subject: DIH import and postImportDeleteQuery

Guys,

I am facing a situation in one of our projects that I need to perform a
cleanup to remove some documents after we perform an update via DIH.
The big issue right now comes from the fact that when we call the DIH
with
clean=false, the postImportDeleteQuery is not executed.

My setup is currently arranged like this:
- A SQL Server stored procedure that receives a parameter (specified in
the
URL) and returns the records to be indexed
- The procedure is able to return all the records (for a full-import) or
only the updated records (for a delta-import)
- This procedure returns valid and deleted records, from this point
comes
the need to run a postImportDeleteQuery to remove the deleted ones.

Everything works fine when I run a full-import, I am running always with
clean=true, and then the whole index is rebuilt.
When I need to do an incremental update, the records are updated
correctly,
but the command to delete the other records is not executed.

I've tried several combinations, with different results:
- Running full-import with clean=false: the records are updated but the
ones
that needs to be deleted stays on the index
- Running delta-import with clean=false: the records are updated but the
ones that needs to be deleted stays on the index
- Running delta-import with clean=true: all records are deleted from the
index and then only the records returned by the procedure are on the
index,
except the deleted ones.

I don't see any way to achieve my goal, without changing the process
that I
do to obtain the data.
Since this is a very complex stored procedure, with tons of joins and
custom
processing, I am trying everything to avoid messing with it.

See below a copy of my data-config.xml file. I made it simpler omitting
all
the fields, since it's out of scope of the issue:


















Any ideas or pointers that might help on this one?

Many thanks,
Alexandre


DIH import and postImportDeleteQuery

2011-05-24 Thread Alexandre Rocco
Guys,

I am facing a situation in one of our projects that I need to perform a
cleanup to remove some documents after we perform an update via DIH.
The big issue right now comes from the fact that when we call the DIH with
clean=false, the postImportDeleteQuery is not executed.

My setup is currently arranged like this:
- A SQL Server stored procedure that receives a parameter (specified in the
URL) and returns the records to be indexed
- The procedure is able to return all the records (for a full-import) or
only the updated records (for a delta-import)
- This procedure returns valid and deleted records, from this point comes
the need to run a postImportDeleteQuery to remove the deleted ones.

Everything works fine when I run a full-import, I am running always with
clean=true, and then the whole index is rebuilt.
When I need to do an incremental update, the records are updated correctly,
but the command to delete the other records is not executed.

I've tried several combinations, with different results:
- Running full-import with clean=false: the records are updated but the ones
that needs to be deleted stays on the index
- Running delta-import with clean=false: the records are updated but the
ones that needs to be deleted stays on the index
- Running delta-import with clean=true: all records are deleted from the
index and then only the records returned by the procedure are on the index,
except the deleted ones.

I don't see any way to achieve my goal, without changing the process that I
do to obtain the data.
Since this is a very complex stored procedure, with tons of joins and custom
processing, I am trying everything to avoid messing with it.

See below a copy of my data-config.xml file. I made it simpler omitting all
the fields, since it's out of scope of the issue:


















Any ideas or pointers that might help on this one?

Many thanks,
Alexandre