[jira] Commented: (SOLR-1426) Allow delta-import to run continously until aborted

2009-10-01 Thread Abdul Chaudhry (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761289#action_12761289
 ] 

Abdul Chaudhry commented on SOLR-1426:
--

NOTE: the last_index_time is broken with the perpetual patch

I hacked around this by changing the data-config.xml file for the deltaQuery to 
do something like this:-

WHERE updated_at  DATE_SUB('${dataimporter.last_index_time}',INTERVAL 10 
SECOND)

This is because of the time discrepancy between the sleep and the writers 
last_index_time.

However, it looks like the delta-import is broken in the latest build of solr 
trunk revision 820731. It looks like the lastIndexTime in the DataImporter is 
not populated after a delta and so if you used ${dataimporter.last_index_time} 
then the deltaQuery uses the wrong time. 

I am going to wait until delta-import is fixed before I update a patch.


 Allow delta-import to run continously until aborted
 ---

 Key: SOLR-1426
 URL: https://issues.apache.org/jira/browse/SOLR-1426
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Abdul Chaudhry
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: delta-import-perpetual.patch


 Modify the delta-import so that it takes a perpetual flag that makes it run 
 continuously until its aborted.
 http://localhost:8985/solr/select/?command=delta-importclean=falseqt=/dataimportcommit=trueperpetual=true
 perpetual means the delta import will keep running and pause for a few 
 seconds when running queries.
 The only way to stop delta import will be to explicitly issue an abort like 
 so:-
 http://localhost:8985/solr/tickets/select/?command=abort

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1426) Allow delta-import to run continously until aborted

2009-10-01 Thread Abdul Chaudhry (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761300#action_12761300
 ] 

Abdul Chaudhry commented on SOLR-1426:
--

The SOLR-783 fix seems to force you to use the entity name with the 
last_index_time

My fix for this was to change the deltaQuery like so :-

WHERE updated_at  DATE_SUB('${dataimporter.[name of 
entity].last_index_time}',INTERVAL 10 SECOND)



 Allow delta-import to run continously until aborted
 ---

 Key: SOLR-1426
 URL: https://issues.apache.org/jira/browse/SOLR-1426
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Abdul Chaudhry
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: delta-import-perpetual.patch


 Modify the delta-import so that it takes a perpetual flag that makes it run 
 continuously until its aborted.
 http://localhost:8985/solr/select/?command=delta-importclean=falseqt=/dataimportcommit=trueperpetual=true
 perpetual means the delta import will keep running and pause for a few 
 seconds when running queries.
 The only way to stop delta import will be to explicitly issue an abort like 
 so:-
 http://localhost:8985/solr/tickets/select/?command=abort

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1426) Allow delta-import to run continously until aborted

2009-09-14 Thread Abdul Chaudhry (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755077#action_12755077
 ] 

Abdul Chaudhry commented on SOLR-1426:
--

The perpetual option only makes sense for one command; that is the delta-import 
command. I could not see a compelling use case for using perpetual with any 
other command.

The abort should stop any in-flight delta-import which is the current behaviour 
with the patch.

The sleep interval should be set using something like perpetual.delay and 
default to a reasonable value such as 3 secs.


 Allow delta-import to run continously until aborted
 ---

 Key: SOLR-1426
 URL: https://issues.apache.org/jira/browse/SOLR-1426
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Abdul Chaudhry
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: delta-import-perpetual.patch


 Modify the delta-import so that it takes a perpetual flag that makes it run 
 continuously until its aborted.
 http://localhost:8985/solr/select/?command=delta-importclean=falseqt=/dataimportcommit=trueperpetual=true
 perpetual means the delta import will keep running and pause for a few 
 seconds when running queries.
 The only way to stop delta import will be to explicitly issue an abort like 
 so:-
 http://localhost:8985/solr/tickets/select/?command=abort

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1426) Allow delta-import to run continously until aborted

2009-09-13 Thread Abdul Chaudhry (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754816#action_12754816
 ] 

Abdul Chaudhry commented on SOLR-1426:
--

You can run a crontab every minute but I need near real-time changes mirrored 
from a set of tables in a database to a search index. 

You should be aware that Lucene 2.9 includes what it calls near realtime search 
capabilities and if you include these into solr 1.4 then the use case for 
delta-import will probably change from running every few hours and minutes 
(which is probably what you are used to right now) and quickly move to running 
every few seconds.  In that case running a crontab every minute is too long to 
wait and writing a script to call curl every few seconds will seem like an 
excessive use of system resources.

So, in answer to your question, it's probably is not a common use case now but 
with lucene 2.9 it will become a common use case.

Anyway, Its your call - take it or leave it.



 Allow delta-import to run continously until aborted
 ---

 Key: SOLR-1426
 URL: https://issues.apache.org/jira/browse/SOLR-1426
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Abdul Chaudhry
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: delta-import-perpetual.patch


 Modify the delta-import so that it takes a perpetual flag that makes it run 
 continuously until its aborted.
 http://localhost:8985/solr/select/?command=delta-importclean=falseqt=/dataimportcommit=trueperpetual=true
 perpetual means the delta import will keep running and pause for a few 
 seconds when running queries.
 The only way to stop delta import will be to explicitly issue an abort like 
 so:-
 http://localhost:8985/solr/tickets/select/?command=abort

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1426) Allow delta-import to run continously until aborted

2009-09-11 Thread Abdul Chaudhry (JIRA)
Allow delta-import to run continously until aborted
---

 Key: SOLR-1426
 URL: https://issues.apache.org/jira/browse/SOLR-1426
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Abdul Chaudhry
 Fix For: 1.4


Modify the delta-import so that it takes a perpetual flag that makes it run 
continuously until its aborted.

http://search-master.fansnap.com:8985/solr/tickets/select/?command=delta-importclean=falseqt=/dataimportcommit=trueperpetual=true

perpetual means the delta import will keep running and pause for a few seconds 
when running queries.

The only way to stop delta import will be to explicitly issue an abort like so:-

http://search-master.fansnap.com:8985/solr/tickets/select/?command=abort


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1426) Allow delta-import to run continously until aborted

2009-09-11 Thread Abdul Chaudhry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdul Chaudhry updated SOLR-1426:
-

Attachment: delta-import-perpetual.patch

Uploaded a patch that implements this feature.

Ran all unit tests on my tree and they pass.

The only thing I have hard-coded is the sleep interval which is :-
Thread.sleep(3000)

This should probably be configurable.

 Allow delta-import to run continously until aborted
 ---

 Key: SOLR-1426
 URL: https://issues.apache.org/jira/browse/SOLR-1426
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Abdul Chaudhry
 Fix For: 1.4

 Attachments: delta-import-perpetual.patch


 Modify the delta-import so that it takes a perpetual flag that makes it run 
 continuously until its aborted.
 http://search-master.fansnap.com:8985/solr/tickets/select/?command=delta-importclean=falseqt=/dataimportcommit=trueperpetual=true
 perpetual means the delta import will keep running and pause for a few 
 seconds when running queries.
 The only way to stop delta import will be to explicitly issue an abort like 
 so:-
 http://search-master.fansnap.com:8985/solr/tickets/select/?command=abort

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1426) Allow delta-import to run continously until aborted

2009-09-11 Thread Abdul Chaudhry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdul Chaudhry updated SOLR-1426:
-

Description: 
Modify the delta-import so that it takes a perpetual flag that makes it run 
continuously until its aborted.

http://localhost:8985/solr/tickets/select/?command=delta-importclean=falseqt=/dataimportcommit=trueperpetual=true

perpetual means the delta import will keep running and pause for a few seconds 
when running queries.

The only way to stop delta import will be to explicitly issue an abort like so:-

http://localhost:8985/solr/tickets/select/?command=abort


  was:
Modify the delta-import so that it takes a perpetual flag that makes it run 
continuously until its aborted.

http://search-master.fansnap.com:8985/solr/tickets/select/?command=delta-importclean=falseqt=/dataimportcommit=trueperpetual=true

perpetual means the delta import will keep running and pause for a few seconds 
when running queries.

The only way to stop delta import will be to explicitly issue an abort like so:-

http://search-master.fansnap.com:8985/solr/tickets/select/?command=abort



 Allow delta-import to run continously until aborted
 ---

 Key: SOLR-1426
 URL: https://issues.apache.org/jira/browse/SOLR-1426
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Abdul Chaudhry
 Fix For: 1.4

 Attachments: delta-import-perpetual.patch


 Modify the delta-import so that it takes a perpetual flag that makes it run 
 continuously until its aborted.
 http://localhost:8985/solr/tickets/select/?command=delta-importclean=falseqt=/dataimportcommit=trueperpetual=true
 perpetual means the delta import will keep running and pause for a few 
 seconds when running queries.
 The only way to stop delta import will be to explicitly issue an abort like 
 so:-
 http://localhost:8985/solr/tickets/select/?command=abort

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2009-09-04 Thread Abdul Chaudhry (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751615#action_12751615
 ] 

Abdul Chaudhry commented on SOLR-236:
-

If this helps you fix your unit tests. I fixed the unit tests by changing the 
CollapseFilter constructor that's used for testing to take a StringIndex like 
so :-

-  CollapseFilter(int collapseMaxDocs, int collapseTreshold) {
+  CollapseFilter(int collapseMaxDocs, int collapseTreshold, 
FieldCache.StringIndex index) {
+this.collapseIndex = index;

and then I changed the unit test cases to move values into a StringIndex in 
CollapseFilterTest like so:-

   public void testNormalCollapse_collapseThresholdOne() {
-collapseFilter = new CollapseFilter(Integer.MAX_VALUE, 1);
+String[] values = new String[]{a, b, c};
+int[] order = new int[]{0, 1, 0, 2, 1, 0, 1};
+FieldCache.StringIndex index = new FieldCache.StringIndex(order, values);
+int[] docIds = new int[]{1, 2, 0, 3, 4, 5, 6};
+
+collapseFilter = new CollapseFilter(Integer.MAX_VALUE, 1, index);

-String[] values = new String[]{a, b, a, c, b, a, b};



 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2009-09-03 Thread Abdul Chaudhry (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751243#action_12751243
 ] 

Abdul Chaudhry commented on SOLR-236:
-

I have some ideas for performance improvements.

I noticed that the code fetches the field cache twice, once for the collapse 
and then for the response object, assuming you asked for the info count in the 
response.

That seems expensive, especially for real-time content.

I think its better to use FieldCache.StringIndex instead of returning a large 
string array and keep it around for the collapse and the response object.

I changed the code so that I keep the cache around like so

  /**
   * Keep the field cached for the collapsed fields for the response object as 
well
   */
  private FieldCache.StringIndex collapseIndex;


when collapsing , you can get the current value using something like this and 
remove the code passing the array

  int currentId = i.nextDoc();
  String currentValue = 
collapseIndex.lookup[collapseIndex.order[currentId]];

when building the response for the info count, you can reference the same cache 
like so:-

  if (collapseInfoCount) {
resCount.add(collapseFieldType.indexedToReadable(
  collapseIndex.lookup[collapseIndex.order[id]]), count);
  }

I also added timing for the cache access as it could be slow if you are doing a 
lot of updates

I have added code for displaying selected fields for the duplicates but its 
difficult to submit . I hope this gets committed as its hard to sumbit  a patch 
as its not in svn and I cannot submit a patch to a patch to a patch .. you get 
the idea.


 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-09-03 Thread Abdul Chaudhry (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751243#action_12751243
 ] 

Abdul Chaudhry edited comment on SOLR-236 at 9/3/09 5:56 PM:
-

I have some ideas for performance improvements.

I noticed that the code fetches the field cache twice, once for the collapse 
and then for the response object, assuming you asked for the info count in the 
response.

That seems expensive, especially for real-time content.

I think its better to use FieldCache.StringIndex instead of returning a large 
string array and keep it around for the collapse and the response object.

I changed the code so that I keep the cache around like so

  /**
   * Keep the field cached for the collapsed fields for the response object as 
well
   */
  private FieldCache.StringIndex collapseIndex;


To get the index use something like this instead of getting the string array 
for all docs

collapseIndex = FieldCache.DEFAULT.getStringIndex(searcher.getReader(), 
collapseField)

when collapsing , you can get the current value using something like this and 
remove the code passing the array

  int currentId = i.nextDoc();
  String currentValue = 
collapseIndex.lookup[collapseIndex.order[currentId]];

when building the response for the info count, you can reference the same cache 
like so:-

  if (collapseInfoCount) {
resCount.add(collapseFieldType.indexedToReadable(
  collapseIndex.lookup[collapseIndex.order[id]]), count);
  }

I also added timing for the cache access as it could be slow if you are doing a 
lot of updates

I have added code for displaying selected fields for the duplicates but its 
difficult to submit . I hope this gets committed as its hard to sumbit  a patch 
as its not in svn and I cannot submit a patch to a patch to a patch .. you get 
the idea.



  was (Author: abdollar):
I have some ideas for performance improvements.

I noticed that the code fetches the field cache twice, once for the collapse 
and then for the response object, assuming you asked for the info count in the 
response.

That seems expensive, especially for real-time content.

I think its better to use FieldCache.StringIndex instead of returning a large 
string array and keep it around for the collapse and the response object.

I changed the code so that I keep the cache around like so

  /**
   * Keep the field cached for the collapsed fields for the response object as 
well
   */
  private FieldCache.StringIndex collapseIndex;


when collapsing , you can get the current value using something like this and 
remove the code passing the array

  int currentId = i.nextDoc();
  String currentValue = 
collapseIndex.lookup[collapseIndex.order[currentId]];

when building the response for the info count, you can reference the same cache 
like so:-

  if (collapseInfoCount) {
resCount.add(collapseFieldType.indexedToReadable(
  collapseIndex.lookup[collapseIndex.order[id]]), count);
  }

I also added timing for the cache access as it could be slow if you are doing a 
lot of updates

I have added code for displaying selected fields for the duplicates but its 
difficult to submit . I hope this gets committed as its hard to sumbit  a patch 
as its not in svn and I cannot submit a patch to a patch to a patch .. you get 
the idea.

  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 

[jira] Created: (SOLR-1391) The XPath field in the XPathEntityResolver should use the resolver to replace possible tokens

2009-08-27 Thread Abdul Chaudhry (JIRA)
The XPath field in the XPathEntityResolver should use the resolver to replace 
possible tokens
-

 Key: SOLR-1391
 URL: https://issues.apache.org/jira/browse/SOLR-1391
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Abdul Chaudhry


If you have a data-import configuration that nests an entity that includes an 
XPath with an XPathEntityProcessor - like so :- 

entity name=a ...etc
  datasource=
  field column=xpath_value/

  entity name=b 
dataSource=filereader
processor=XPathEntityProcessor
... etc /

  field column=my_field xpath=${a.xpath_value} /
  /entity

/entity

This will fail with an error like so

Caused by: java.lang.RuntimeException: xpath must start with '/' : 
${a.xpath_value}

We should allow the xpath to be replaced with the token from entity a



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1391) The XPath field in the XPathEntityResolver should use the resolver to replace possible tokens

2009-08-27 Thread Abdul Chaudhry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdul Chaudhry updated SOLR-1391:
-

Attachment: xpath.patch

The fix is simple from what I can tell and I have updated the patch. I just 
used resolver.replaceTokens on the xpath field

 The XPath field in the XPathEntityResolver should use the resolver to replace 
 possible tokens
 -

 Key: SOLR-1391
 URL: https://issues.apache.org/jira/browse/SOLR-1391
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Abdul Chaudhry
 Attachments: xpath.patch


 If you have a data-import configuration that nests an entity that includes an 
 XPath with an XPathEntityProcessor - like so :- 
 entity name=a ...etc
   datasource=
   field column=xpath_value/
   entity name=b 
 dataSource=filereader
 processor=XPathEntityProcessor
 ... etc /
   field column=my_field xpath=${a.xpath_value} /
   /entity
 /entity
 This will fail with an error like so
 Caused by: java.lang.RuntimeException: xpath must start with '/' : 
 ${a.xpath_value}
 We should allow the xpath to be replaced with the token from entity a

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1262) DIH needs support for prepared statements

2009-07-20 Thread Abdul Chaudhry (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733138#action_12733138
 ] 

Abdul Chaudhry commented on SOLR-1262:
--

I could try to use stored procedures instead of trying to get prepared stements 
to work in the DIH - however that would mean most of the logic would need to 
live in SQL and I hate SQL.  

 DIH needs support for prepared statements 
 --

 Key: SOLR-1262
 URL: https://issues.apache.org/jira/browse/SOLR-1262
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.3
 Environment: linux
 mysql
Reporter: Abdul Chaudhry
Assignee: Noble Paul
Priority: Minor

 During an indexing run we noticed that we were spending a lot of time 
 creating and tearing down queries in mysql
 The queries we are using are complex and involve joins spanning across 
 multiple tables.
 We should support prepared statements in the data import handler via the 
 data-config.xml file - for those databases that support prepared statements.
 We could add a new attribute to the entity entity in dataConfig - say - 
 pquery or preparedQuery and then pass the prepared statement and have values 
 filled in by the actual queries for each row using a placeholder - like a ? 
 or something else.
 I would probably start by hacking class JdbcDataSource to try a test but was 
 wondering if anyone had experienced this or had any suggestions or if there 
 is something in the works that I missed - I couldn't find any other bugs 
 mentioning using prepared statements for performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1262) DIH needs support for prepared statements

2009-07-06 Thread Abdul Chaudhry (JIRA)
DIH needs support for prepared statements 
--

 Key: SOLR-1262
 URL: https://issues.apache.org/jira/browse/SOLR-1262
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.3
 Environment: linux
mysql
Reporter: Abdul Chaudhry
Priority: Critical


During an indexing run we noticed that we were spending a lot of time creating 
and tearing down queries in mysql

The queries we are using are complex and involve joins spanning across multiple 
tables.

We should support prepared statements in the data import handler via the 
data-config.xml file - for those databases that support prepared statements.

We could add a new attribute to the entity entity in dataConfig - say - pquery 
or preparedQuery and then pass the prepared statement and have values filled in 
by the actual queries for each row using a placeholder - like a ? or something 
else.

I would probably start by hacking class JdbcDataSource to try a test but was 
wondering if anyone had experienced this or had any suggestions or if there is 
something in the works that I missed - I couldn't find any other bugs 
mentioning using prepared statements for performance.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.