[jira] [Commented] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2015-01-06 Thread Liram Vardi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265851#comment-14265851
 ] 

Liram Vardi commented on SOLR-:
---

You are welcome!
Thanks :-)

> Dynamic copy fields are considering all dynamic fields, causing a significant 
> performance impact on indexing documents
> --
>
> Key: SOLR-
> URL: https://issues.apache.org/jira/browse/SOLR-
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, update
> Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
> specific CopyFields for dynamic fields, but without wildcards (the fields are 
> dynamic, the copy directive is not)
>Reporter: Liram Vardi
>Assignee: Erick Erickson
> Fix For: 5.0, Trunk
>
> Attachments: SOLR-.patch, SOLR-.patch, SOLR-.patch, 
> SOLR-.patch
>
>
> Result:
> After applying a fix for this issue, tests which we conducted show more than 
> 40 percent improvement on our insertion performance.
> Explanation:
> Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
> This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
> following method, "getCopyFieldsList()":
> {code:title=getCopyFieldsList() |borderStyle=solid}
> final List result = new ArrayList<>();
> for (DynamicCopy dynamicCopy : dynamicCopyFields) {
>   if (dynamicCopy.matches(sourceField)) {
> result.add(new CopyField(getField(sourceField), 
> dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
>   }
> }
> List fixedCopyFields = copyFieldsMap.get(sourceField);
> if (null != fixedCopyFields) {
>   result.addAll(fixedCopyFields);
> }
> {code}
> This function tries to find for an input source field all its copyFields (All 
> its destinations which Solr need to move this field). 
> As you can probably note, the first part of the procedure is the procedure 
> most “expensive” step (takes O( n ) time while N is the size of the 
> "dynamicCopyFields" group).
> The next part is just a simple "hash" extraction, which takes O(1) time. 
> Our schema contains over then 500 copyFields but only 70 of then are 
> "indexed" fields. 
> We also have one dynamic field with  a wildcard ( * ), which "catches" the 
> rest of the document fields. 
> As you can conclude, we have more than 400 copyFields that are based on this 
> dynamicField but all, except one, are fixed (i.e. does not contain any 
> wildcard).
> From some reason, the copyFields registration procedure defines those 400 
> fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
> array, 
> This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
> justification: All of those 400 copyFields are not glob and therefore do not 
> need any complex pattern matching to the input field. They all can be store 
> at the "fixedCopyFields".
> Only copyFields with asterisks need this "special" treatment and they are 
> (especially on our case) pretty rare.  
> Therefore, we created a patch which fix this problem by changing the 
> registerCopyField() procedure.
> Test which we conducted show that there is no change in the Indexing results. 
> Moreover, the fix still successfully passes the class unit tests (i.e. 
> IndexSchemaTest.java).
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-12-21 Thread Liram Vardi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255107#comment-14255107
 ] 

Liram Vardi commented on SOLR-:
---

Hi Erick,

Thanks you for the comprehensive review for this post.

Indeed, my patch causes a loading failure and also to 
TestFieldCollectionResource to fail, when we are using the “fail_dynamic” 
example.
(TestManagedSotpFilterFactory and TestManagedSynonymFileFactory failed on my 
environment regardless the patch.)
Although, based on the following wiki, I was sure that this example is invalid. 
https://cwiki.apache.org/confluence/display/solr/Copying+Fields 
However, based on your explanation I tried to find a combine solution which 
satisfies  those two cases, as you said.

The case that I am trying to solve is the case that source is not explicit 
(which means that does not have a field definition and it is only instantiated 
by matching a dynamic field – the second case that you described on your 
response). 
So, let make things a bit more ordered.
Let us assume three possible types for each copyfield, source and destination:
1)  Explicit – the field has explicit “field” definition.
2)  Glob – The field contains an asterisks on its copyField definition and 
it matches to one (or more) of the fields definitions (dynamic or static).
3)  Reference – the copy field references to some dynamic field, but it is 
without any asterisks.

Each copyfield’s source and destination belongs to one of those types.
When Solr reads the schema, it divides the copy fields eventually to two 
groups: fixedCopyFields and to dynamicCopyFields. 
As I explained before, the “fixedCopyFields” is much less expensive than the 
“dynamicCopyFields”.
Now, let define the following decision table:
|Case||Source||Destination||Decision||
|1|Explicit|Explicit|fixedCopyFields| 
|2|Explicit|Glob|Error!|
|3|Explicit|Reference|dynamicCopyFields|
|4|Glob|Explicit|dynamicCopyFields|
|5|Glob|Glob|dynamicCopyFields|
|6|Glob|Reference|dynamicCopyFields|
|*7*|*Reference*|*Explicit*|*fixedCopyFields*| 
|8|Reference|Glob|dynamicCopyFields|
|9|Reference|Reference|dynamicCopyFields|

As you can see, until today only for case “1” (source and destination are 
explicit), Solr put those copy fields on the “static” hash.
On the next version of patch SOLR-, I did a refectory on the “if” statement 
which divides those copyfields.
At the previous version of the patch, the code throw exception on case 8 (.i.e 
fail_dynamic example).
Now after the refectory, case “8” is legal again and case “7”, which is the one 
that I am trying to solve, sends those copyfields to the “fixedCopyFields” map.

Another open question is if cases 3 and 9 need also to stay as 
“DynamicCopyFields” or can we make the update more efficient by moving those 
also to the “static” map… But currently the patch does not change this.

The second version of the patch is attached.  

Thanks!


> Dynamic copy fields are considering all dynamic fields, causing a significant 
> performance impact on indexing documents
> --
>
> Key: SOLR-
> URL: https://issues.apache.org/jira/browse/SOLR-
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, update
> Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
> specific CopyFields for dynamic fields, but without wildcards (the fields are 
> dynamic, the copy directive is not)
>Reporter: Liram Vardi
>Assignee: Erick Erickson
> Attachments: SOLR-.patch
>
>
> Result:
> After applying a fix for this issue, tests which we conducted show more than 
> 40 percent improvement on our insertion performance.
> Explanation:
> Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
> This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
> following method, "getCopyFieldsList()":
> {code:title=getCopyFieldsList() |borderStyle=solid}
> final List result = new ArrayList<>();
> for (DynamicCopy dynamicCopy : dynamicCopyFields) {
>   if (dynamicCopy.matches(sourceField)) {
> result.add(new CopyField(getField(sourceField), 
> dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
>   }
> }
> List fixedCopyFields = copyFieldsMap.get(sourceField);
> if (null != fixedCopyFields) {
>   result.addAll(fixedCopyFields);
> }
> {code}
> This function tries to find for an input source field all its copyFields (All 
> its destinations which Solr need to move this field). 
> As you can probably note, the first part of the procedure is the procedure 
> most “expensive” step (takes O( n ) time while N is the size of the 
> "dynamicCopyFields" group).
> The next part is

[jira] [Commented] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-11-26 Thread Liram Vardi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225926#comment-14225926
 ] 

Liram Vardi commented on SOLR-:
---

Hi all,
Did anyone have a chance to take a look on this?
Thanks

> Dynamic copy fields are considering all dynamic fields, causing a significant 
> performance impact on indexing documents
> --
>
> Key: SOLR-
> URL: https://issues.apache.org/jira/browse/SOLR-
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, update
> Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
> specific CopyFields for dynamic fields, but without wildcards (the fields are 
> dynamic, the copy directive is not)
>Reporter: Liram Vardi
>Assignee: Erick Erickson
> Attachments: SOLR-.patch
>
>
> Result:
> After applying a fix for this issue, tests which we conducted show more than 
> 40 percent improvement on our insertion performance.
> Explanation:
> Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
> This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
> following method, "getCopyFieldsList()":
> {code:title=getCopyFieldsList() |borderStyle=solid}
> final List result = new ArrayList<>();
> for (DynamicCopy dynamicCopy : dynamicCopyFields) {
>   if (dynamicCopy.matches(sourceField)) {
> result.add(new CopyField(getField(sourceField), 
> dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
>   }
> }
> List fixedCopyFields = copyFieldsMap.get(sourceField);
> if (null != fixedCopyFields) {
>   result.addAll(fixedCopyFields);
> }
> {code}
> This function tries to find for an input source field all its copyFields (All 
> its destinations which Solr need to move this field). 
> As you can probably note, the first part of the procedure is the procedure 
> most “expensive” step (takes O( n ) time while N is the size of the 
> "dynamicCopyFields" group).
> The next part is just a simple "hash" extraction, which takes O(1) time. 
> Our schema contains over then 500 copyFields but only 70 of then are 
> "indexed" fields. 
> We also have one dynamic field with  a wildcard ( * ), which "catches" the 
> rest of the document fields. 
> As you can conclude, we have more than 400 copyFields that are based on this 
> dynamicField but all, except one, are fixed (i.e. does not contain any 
> wildcard).
> From some reason, the copyFields registration procedure defines those 400 
> fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
> array, 
> This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
> justification: All of those 400 copyFields are not glob and therefore do not 
> need any complex pattern matching to the input field. They all can be store 
> at the "fixedCopyFields".
> Only copyFields with asterisks need this "special" treatment and they are 
> (especially on our case) pretty rare.  
> Therefore, we created a patch which fix this problem by changing the 
> registerCopyField() procedure.
> Test which we conducted show that there is no change in the Indexing results. 
> Moreover, the fix still successfully passes the class unit tests (i.e. 
> IndexSchemaTest.java).
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188411#comment-14188411
 ] 

Liram Vardi commented on SOLR-:
---

Thanks :-)
The patch is attached.

> Dynamic copy fields are considering all dynamic fields, causing a significant 
> performance impact on indexing documents
> --
>
> Key: SOLR-
> URL: https://issues.apache.org/jira/browse/SOLR-
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, update
> Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
> specific CopyFields for dynamic fields, but without wildcards (the fields are 
> dynamic, the copy directive is not)
>Reporter: Liram Vardi
> Attachments: SOLR-.patch
>
>
> Result:
> After applying a fix for this issue, tests which we conducted show more than 
> 40 percent improvement on our insertion performance.
> Explanation:
> Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
> This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
> following method, "getCopyFieldsList()":
> {code:title=getCopyFieldsList() |borderStyle=solid}
> final List result = new ArrayList<>();
> for (DynamicCopy dynamicCopy : dynamicCopyFields) {
>   if (dynamicCopy.matches(sourceField)) {
> result.add(new CopyField(getField(sourceField), 
> dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
>   }
> }
> List fixedCopyFields = copyFieldsMap.get(sourceField);
> if (null != fixedCopyFields) {
>   result.addAll(fixedCopyFields);
> }
> {code}
> This function tries to find for an input source field all its copyFields (All 
> its destinations which Solr need to move this field). 
> As you can probably note, the first part of the procedure is the procedure 
> most “expensive” step (takes O( n ) time while N is the size of the 
> "dynamicCopyFields" group).
> The next part is just a simple "hash" extraction, which takes O(1) time. 
> Our schema contains over then 500 copyFields but only 70 of then are 
> "indexed" fields. 
> We also have one dynamic field with  a wildcard ( * ), which "catches" the 
> rest of the document fields. 
> As you can conclude, we have more than 400 copyFields that are based on this 
> dynamicField but all, except one, are fixed (i.e. does not contain any 
> wildcard).
> From some reason, the copyFields registration procedure defines those 400 
> fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
> array, 
> This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
> justification: All of those 400 copyFields are not glob and therefore do not 
> need any complex pattern matching to the input field. They all can be store 
> at the "fixedCopyFields".
> Only copyFields with asterisks need this "special" treatment and they are 
> (especially on our case) pretty rare.  
> Therefore, we created a patch which fix this problem by changing the 
> registerCopyField() procedure.
> Test which we conducted show that there is no change in the Indexing results. 
> Moreover, the fix still successfully passes the class unit tests (i.e. 
> IndexSchemaTest.java).
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi closed SOLR-6667.
-
Resolution: Duplicate

Duplicate of SOLR-

> Dynamic copy fields are considering all dynamic fields, causing a significant 
> performance impact on indexing documents
> --
>
> Key: SOLR-6667
> URL: https://issues.apache.org/jira/browse/SOLR-6667
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, update
>Affects Versions: 4.8
> Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
> specific CopyFields for dynamic fields, but without wildcards (the fields are 
> dynamic, the copy directive is not)
>Reporter: Liram Vardi
> Attachments: SOLR-6667.patch
>
>
> Result:
> After applying a fix for this issue, tests which we conducted show more than 
> 40 percent improvement on our insertion performance.
> Explanation:
> Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
> This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
> following method, "getCopyFieldsList()":
> {code:title=getCopyFieldsList() |borderStyle=solid}
> final List result = new ArrayList<>();
> for (DynamicCopy dynamicCopy : dynamicCopyFields) {
>   if (dynamicCopy.matches(sourceField)) {
> result.add(new CopyField(getField(sourceField), 
> dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
>   }
> }
> List fixedCopyFields = copyFieldsMap.get(sourceField);
> if (null != fixedCopyFields) {
>   result.addAll(fixedCopyFields);
> }
> {code}
> This function tries to find for an input source field all its copyFields (All 
> its destinations which Solr need to move this field). 
> As you can probably note, the first part of the procedure is the procedure 
> most “expensive” step (takes O( n ) time while N is the size of the 
> "dynamicCopyFields" group).
> The next part is just a simple "hash" extraction, which takes O(1) time. 
> Our schema contains over then 500 copyFields but only 70 of then are 
> "indexed" fields. 
> We also have one dynamic field with  a wildcard ( * ), which "catches" the 
> rest of the document fields. 
> As you can conclude, we have more than 400 copyFields that are based on this 
> dynamicField but all, except one, are fixed (i.e. does not contain any 
> wildcard).
> From some reason, the copyFields registration procedure defines those 400 
> fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
> array, 
> This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
> justification: All of those 400 copyFields are not glob and therefore do not 
> need any complex pattern matching to the input field. They all can be store 
> at the "fixedCopyFields".
> Only copyFields with asterisks need this "special" treatment and they are 
> (especially on our case) pretty rare.  
> Therefore, we created a patch which fix this problem by changing the 
> registerCopyField() procedure.
> Test which we conducted show that there is no change in the Indexing results. 
> Moreover, the fix still successfully passes the class unit tests (i.e. 
> IndexSchemaTest.java).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-:
--
Attachment: SOLR-.patch

> Dynamic copy fields are considering all dynamic fields, causing a significant 
> performance impact on indexing documents
> --
>
> Key: SOLR-
> URL: https://issues.apache.org/jira/browse/SOLR-
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, update
> Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
> specific CopyFields for dynamic fields, but without wildcards (the fields are 
> dynamic, the copy directive is not)
>Reporter: Liram Vardi
> Attachments: SOLR-.patch
>
>
> Result:
> After applying a fix for this issue, tests which we conducted show more than 
> 40 percent improvement on our insertion performance.
> Explanation:
> Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
> This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
> following method, "getCopyFieldsList()":
> {code:title=getCopyFieldsList() |borderStyle=solid}
> final List result = new ArrayList<>();
> for (DynamicCopy dynamicCopy : dynamicCopyFields) {
>   if (dynamicCopy.matches(sourceField)) {
> result.add(new CopyField(getField(sourceField), 
> dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
>   }
> }
> List fixedCopyFields = copyFieldsMap.get(sourceField);
> if (null != fixedCopyFields) {
>   result.addAll(fixedCopyFields);
> }
> {code}
> This function tries to find for an input source field all its copyFields (All 
> its destinations which Solr need to move this field). 
> As you can probably note, the first part of the procedure is the procedure 
> most “expensive” step (takes O( n ) time while N is the size of the 
> "dynamicCopyFields" group).
> The next part is just a simple "hash" extraction, which takes O(1) time. 
> Our schema contains over then 500 copyFields but only 70 of then are 
> "indexed" fields. 
> We also have one dynamic field with  a wildcard ( * ), which "catches" the 
> rest of the document fields. 
> As you can conclude, we have more than 400 copyFields that are based on this 
> dynamicField but all, except one, are fixed (i.e. does not contain any 
> wildcard).
> From some reason, the copyFields registration procedure defines those 400 
> fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
> array, 
> This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
> justification: All of those 400 copyFields are not glob and therefore do not 
> need any complex pattern matching to the input field. They all can be store 
> at the "fixedCopyFields".
> Only copyFields with asterisks need this "special" treatment and they are 
> (especially on our case) pretty rare.  
> Therefore, we created a patch which fix this problem by changing the 
> registerCopyField() procedure.
> Test which we conducted show that there is no change in the Indexing results. 
> Moreover, the fix still successfully passes the class unit tests (i.e. 
> IndexSchemaTest.java).
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-:
--
Description: 
Result:
After applying a fix for this issue, tests which we conducted show more than 40 
percent improvement on our insertion performance.

Explanation:

Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
following method, "getCopyFieldsList()":

{code:title=getCopyFieldsList() |borderStyle=solid}
final List result = new ArrayList<>();
for (DynamicCopy dynamicCopy : dynamicCopyFields) {
  if (dynamicCopy.matches(sourceField)) {
result.add(new CopyField(getField(sourceField), 
dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
  }
}
List fixedCopyFields = copyFieldsMap.get(sourceField);
if (null != fixedCopyFields) {
  result.addAll(fixedCopyFields);
}
{code}

This function tries to find for an input source field all its copyFields (All 
its destinations which Solr need to move this field). 
As you can probably note, the first part of the procedure is the procedure most 
“expensive” step (takes O( n ) time while N is the size of the 
"dynamicCopyFields" group).
The next part is just a simple "hash" extraction, which takes O(1) time. 

Our schema contains over then 500 copyFields but only 70 of then are "indexed" 
fields. 
We also have one dynamic field with  a wildcard ( * ), which "catches" the rest 
of the document fields. 
As you can conclude, we have more than 400 copyFields that are based on this 
dynamicField but all, except one, are fixed (i.e. does not contain any 
wildcard).

>From some reason, the copyFields registration procedure defines those 400 
>fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
>array, 
This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
justification: All of those 400 copyFields are not glob and therefore do not 
need any complex pattern matching to the input field. They all can be store at 
the "fixedCopyFields".
Only copyFields with asterisks need this "special" treatment and they are 
(especially on our case) pretty rare.  

Therefore, we created a patch which fix this problem by changing the 
registerCopyField() procedure.
Test which we conducted show that there is no change in the Indexing results. 
Moreover, the fix still successfully passes the class unit tests (i.e. 
IndexSchemaTest.java).

   

  was:
Result:
After applying a fix for this issue, tests which we conducted show more than 40 
percent improvement on our insertion performance.

Explanation:

Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
following method, "getCopyFieldsList()":

{code:title=getCopyFieldsList() |borderStyle=solid}
final List result = new ArrayList<>();
for (DynamicCopy dynamicCopy : dynamicCopyFields) {
  if (dynamicCopy.matches(sourceField)) {
result.add(new CopyField(getField(sourceField), 
dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
  }
}
List fixedCopyFields = copyFieldsMap.get(sourceField);
if (null != fixedCopyFields) {
  result.addAll(fixedCopyFields);
}
{code}

This function tries to find for an input source field all its copyFields (All 
its destinations which Solr need to move this field). 
As you can probably note, the first part of the procedure is the procedure most 
“expensive” step (takes O(n) time while N is the size of the 
"dynamicCopyFields" group).
The next part is just a simple "hash" extraction, which takes O(1) time. 

Our schema contains over then 500 copyFields but only 70 of then are "indexed" 
fields. 
We also have one dynamic field with  a wildcard (*), which "catches" the rest 
of the document fields. 
As you can conclude, we have more than 400 copyFields that are based on this 
dynamicField but all, except one, are fixed (i.e. does not contain any 
wildcard).

>From some reason, the copyFields registration procedure defines those 400 
>fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
>array, 
This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
justification: All of those 400 copyFields are not glob and therefore do not 
need any complex pattern matching to the input field. They all can be store at 
the "fixedCopyFields".
Only copyFields with asterisks need this "special" treatment and they are 
(especially on our case) pretty rare.  

Therefore, we created a patch which fix this problem by changing the 
registerCopyField() procedure.
Test which we conducted show that there is no change in the Indexing results. 
Moreover, the fix still successfully passes the class unit tests (i.e. 
IndexSchemaTest.java).

   


> Dynami

[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-6667:
--
Attachment: SOLR-6667.patch

> Dynamic copy fields are considering all dynamic fields, causing a significant 
> performance impact on indexing documents
> --
>
> Key: SOLR-6667
> URL: https://issues.apache.org/jira/browse/SOLR-6667
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, update
>Affects Versions: 4.8
> Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
> specific CopyFields for dynamic fields, but without wildcards (the fields are 
> dynamic, the copy directive is not)
>Reporter: Liram Vardi
> Attachments: SOLR-6667.patch
>
>
> Result:
> After applying a fix for this issue, tests which we conducted show more than 
> 40 percent improvement on our insertion performance.
> Explanation:
> Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
> This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
> following method, "getCopyFieldsList()":
> {code:title=getCopyFieldsList() |borderStyle=solid}
> final List result = new ArrayList<>();
> for (DynamicCopy dynamicCopy : dynamicCopyFields) {
>   if (dynamicCopy.matches(sourceField)) {
> result.add(new CopyField(getField(sourceField), 
> dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
>   }
> }
> List fixedCopyFields = copyFieldsMap.get(sourceField);
> if (null != fixedCopyFields) {
>   result.addAll(fixedCopyFields);
> }
> {code}
> This function tries to find for an input source field all its copyFields (All 
> its destinations which Solr need to move this field). 
> As you can probably note, the first part of the procedure is the procedure 
> most “expensive” step (takes O( n ) time while N is the size of the 
> "dynamicCopyFields" group).
> The next part is just a simple "hash" extraction, which takes O(1) time. 
> Our schema contains over then 500 copyFields but only 70 of then are 
> "indexed" fields. 
> We also have one dynamic field with  a wildcard ( * ), which "catches" the 
> rest of the document fields. 
> As you can conclude, we have more than 400 copyFields that are based on this 
> dynamicField but all, except one, are fixed (i.e. does not contain any 
> wildcard).
> From some reason, the copyFields registration procedure defines those 400 
> fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
> array, 
> This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
> justification: All of those 400 copyFields are not glob and therefore do not 
> need any complex pattern matching to the input field. They all can be store 
> at the "fixedCopyFields".
> Only copyFields with asterisks need this "special" treatment and they are 
> (especially on our case) pretty rare.  
> Therefore, we created a patch which fix this problem by changing the 
> registerCopyField() procedure.
> Test which we conducted show that there is no change in the Indexing results. 
> Moreover, the fix still successfully passes the class unit tests (i.e. 
> IndexSchemaTest.java).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-6667:
--
Issue Type: Improvement  (was: Bug)

> Dynamic copy fields are considering all dynamic fields, causing a significant 
> performance impact on indexing documents
> --
>
> Key: SOLR-6667
> URL: https://issues.apache.org/jira/browse/SOLR-6667
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, update
>Affects Versions: 4.8
> Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
> specific CopyFields for dynamic fields, but without wildcards (the fields are 
> dynamic, the copy directive is not)
>Reporter: Liram Vardi
>
> Result:
> After applying a fix for this issue, tests which we conducted show more than 
> 40 percent improvement on our insertion performance.
> Explanation:
> Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
> This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
> following method, "getCopyFieldsList()":
> {code:title=getCopyFieldsList() |borderStyle=solid}
> final List result = new ArrayList<>();
> for (DynamicCopy dynamicCopy : dynamicCopyFields) {
>   if (dynamicCopy.matches(sourceField)) {
> result.add(new CopyField(getField(sourceField), 
> dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
>   }
> }
> List fixedCopyFields = copyFieldsMap.get(sourceField);
> if (null != fixedCopyFields) {
>   result.addAll(fixedCopyFields);
> }
> {code}
> This function tries to find for an input source field all its copyFields (All 
> its destinations which Solr need to move this field). 
> As you can probably note, the first part of the procedure is the procedure 
> most “expensive” step (takes O( n ) time while N is the size of the 
> "dynamicCopyFields" group).
> The next part is just a simple "hash" extraction, which takes O(1) time. 
> Our schema contains over then 500 copyFields but only 70 of then are 
> "indexed" fields. 
> We also have one dynamic field with  a wildcard ( * ), which "catches" the 
> rest of the document fields. 
> As you can conclude, we have more than 400 copyFields that are based on this 
> dynamicField but all, except one, are fixed (i.e. does not contain any 
> wildcard).
> From some reason, the copyFields registration procedure defines those 400 
> fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
> array, 
> This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
> justification: All of those 400 copyFields are not glob and therefore do not 
> need any complex pattern matching to the input field. They all can be store 
> at the "fixedCopyFields".
> Only copyFields with asterisks need this "special" treatment and they are 
> (especially on our case) pretty rare.  
> Therefore, we created a patch which fix this problem by changing the 
> registerCopyField() procedure.
> Test which we conducted show that there is no change in the Indexing results. 
> Moreover, the fix still successfully passes the class unit tests (i.e. 
> IndexSchemaTest.java).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-6667:
--
Issue Type: Bug  (was: Improvement)

> Dynamic copy fields are considering all dynamic fields, causing a significant 
> performance impact on indexing documents
> --
>
> Key: SOLR-6667
> URL: https://issues.apache.org/jira/browse/SOLR-6667
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis, update
>Affects Versions: 4.8
> Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
> specific CopyFields for dynamic fields, but without wildcards (the fields are 
> dynamic, the copy directive is not)
>Reporter: Liram Vardi
>
> Result:
> After applying a fix for this issue, tests which we conducted show more than 
> 40 percent improvement on our insertion performance.
> Explanation:
> Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
> This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
> following method, "getCopyFieldsList()":
> {code:title=getCopyFieldsList() |borderStyle=solid}
> final List result = new ArrayList<>();
> for (DynamicCopy dynamicCopy : dynamicCopyFields) {
>   if (dynamicCopy.matches(sourceField)) {
> result.add(new CopyField(getField(sourceField), 
> dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
>   }
> }
> List fixedCopyFields = copyFieldsMap.get(sourceField);
> if (null != fixedCopyFields) {
>   result.addAll(fixedCopyFields);
> }
> {code}
> This function tries to find for an input source field all its copyFields (All 
> its destinations which Solr need to move this field). 
> As you can probably note, the first part of the procedure is the procedure 
> most “expensive” step (takes O( n ) time while N is the size of the 
> "dynamicCopyFields" group).
> The next part is just a simple "hash" extraction, which takes O(1) time. 
> Our schema contains over then 500 copyFields but only 70 of then are 
> "indexed" fields. 
> We also have one dynamic field with  a wildcard ( * ), which "catches" the 
> rest of the document fields. 
> As you can conclude, we have more than 400 copyFields that are based on this 
> dynamicField but all, except one, are fixed (i.e. does not contain any 
> wildcard).
> From some reason, the copyFields registration procedure defines those 400 
> fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
> array, 
> This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
> justification: All of those 400 copyFields are not glob and therefore do not 
> need any complex pattern matching to the input field. They all can be store 
> at the "fixedCopyFields".
> Only copyFields with asterisks need this "special" treatment and they are 
> (especially on our case) pretty rare.  
> Therefore, we created a patch which fix this problem by changing the 
> registerCopyField() procedure.
> Test which we conducted show that there is no change in the Indexing results. 
> Moreover, the fix still successfully passes the class unit tests (i.e. 
> IndexSchemaTest.java).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-6667:
--
Affects Version/s: 4.8

> Dynamic copy fields are considering all dynamic fields, causing a significant 
> performance impact on indexing documents
> --
>
> Key: SOLR-6667
> URL: https://issues.apache.org/jira/browse/SOLR-6667
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, update
>Affects Versions: 4.8
> Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
> specific CopyFields for dynamic fields, but without wildcards (the fields are 
> dynamic, the copy directive is not)
>Reporter: Liram Vardi
>
> Result:
> After applying a fix for this issue, tests which we conducted show more than 
> 40 percent improvement on our insertion performance.
> Explanation:
> Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
> This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
> following method, "getCopyFieldsList()":
> {code:title=getCopyFieldsList() |borderStyle=solid}
> final List result = new ArrayList<>();
> for (DynamicCopy dynamicCopy : dynamicCopyFields) {
>   if (dynamicCopy.matches(sourceField)) {
> result.add(new CopyField(getField(sourceField), 
> dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
>   }
> }
> List fixedCopyFields = copyFieldsMap.get(sourceField);
> if (null != fixedCopyFields) {
>   result.addAll(fixedCopyFields);
> }
> {code}
> This function tries to find for an input source field all its copyFields (All 
> its destinations which Solr need to move this field). 
> As you can probably note, the first part of the procedure is the procedure 
> most “expensive” step (takes O( n ) time while N is the size of the 
> "dynamicCopyFields" group).
> The next part is just a simple "hash" extraction, which takes O(1) time. 
> Our schema contains over then 500 copyFields but only 70 of then are 
> "indexed" fields. 
> We also have one dynamic field with  a wildcard ( * ), which "catches" the 
> rest of the document fields. 
> As you can conclude, we have more than 400 copyFields that are based on this 
> dynamicField but all, except one, are fixed (i.e. does not contain any 
> wildcard).
> From some reason, the copyFields registration procedure defines those 400 
> fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
> array, 
> This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
> justification: All of those 400 copyFields are not glob and therefore do not 
> need any complex pattern matching to the input field. They all can be store 
> at the "fixedCopyFields".
> Only copyFields with asterisks need this "special" treatment and they are 
> (especially on our case) pretty rare.  
> Therefore, we created a patch which fix this problem by changing the 
> registerCopyField() procedure.
> Test which we conducted show that there is no change in the Indexing results. 
> Moreover, the fix still successfully passes the class unit tests (i.e. 
> IndexSchemaTest.java).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)
Liram Vardi created SOLR-6667:
-

 Summary: Dynamic copy fields are considering all dynamic fields, 
causing a significant performance impact on indexing documents
 Key: SOLR-6667
 URL: https://issues.apache.org/jira/browse/SOLR-6667
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
specific CopyFields for dynamic fields, but without wildcards (the fields are 
dynamic, the copy directive is not)
Reporter: Liram Vardi


Result:
After applying a fix for this issue, tests which we conducted show more than 40 
percent improvement on our insertion performance.

Explanation:

Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
following method, "getCopyFieldsList()":

{code:title=getCopyFieldsList() |borderStyle=solid}
final List result = new ArrayList<>();
for (DynamicCopy dynamicCopy : dynamicCopyFields) {
  if (dynamicCopy.matches(sourceField)) {
result.add(new CopyField(getField(sourceField), 
dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
  }
}
List fixedCopyFields = copyFieldsMap.get(sourceField);
if (null != fixedCopyFields) {
  result.addAll(fixedCopyFields);
}
{code}

This function tries to find for an input source field all its copyFields (All 
its destinations which Solr need to move this field). 
As you can probably note, the first part of the procedure is the procedure most 
“expensive” step (takes O( n ) time while N is the size of the 
"dynamicCopyFields" group).
The next part is just a simple "hash" extraction, which takes O(1) time. 

Our schema contains over then 500 copyFields but only 70 of then are "indexed" 
fields. 
We also have one dynamic field with  a wildcard ( * ), which "catches" the rest 
of the document fields. 
As you can conclude, we have more than 400 copyFields that are based on this 
dynamicField but all, except one, are fixed (i.e. does not contain any 
wildcard).

>From some reason, the copyFields registration procedure defines those 400 
>fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
>array, 
This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
justification: All of those 400 copyFields are not glob and therefore do not 
need any complex pattern matching to the input field. They all can be store at 
the "fixedCopyFields".
Only copyFields with asterisks need this "special" treatment and they are 
(especially on our case) pretty rare.  

Therefore, we created a patch which fix this problem by changing the 
registerCopyField() procedure.
Test which we conducted show that there is no change in the Indexing results. 
Moreover, the fix still successfully passes the class unit tests (i.e. 
IndexSchemaTest.java).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)
Liram Vardi created SOLR-:
-

 Summary: Dynamic copy fields are considering all dynamic fields, 
causing a significant performance impact on indexing documents
 Key: SOLR-
 URL: https://issues.apache.org/jira/browse/SOLR-
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
specific CopyFields for dynamic fields, but without wildcards (the fields are 
dynamic, the copy directive is not)
Reporter: Liram Vardi


Result:
After applying a fix for this issue, tests which we conducted show more than 40 
percent improvement on our insertion performance.

Explanation:

Using JVM profiler, we found a CPU "bottleneck" during Solr indexing process. 
This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
following method, "getCopyFieldsList()":

{code:title=getCopyFieldsList() |borderStyle=solid}
final List result = new ArrayList<>();
for (DynamicCopy dynamicCopy : dynamicCopyFields) {
  if (dynamicCopy.matches(sourceField)) {
result.add(new CopyField(getField(sourceField), 
dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
  }
}
List fixedCopyFields = copyFieldsMap.get(sourceField);
if (null != fixedCopyFields) {
  result.addAll(fixedCopyFields);
}
{code}

This function tries to find for an input source field all its copyFields (All 
its destinations which Solr need to move this field). 
As you can probably note, the first part of the procedure is the procedure most 
“expensive” step (takes O(n) time while N is the size of the 
"dynamicCopyFields" group).
The next part is just a simple "hash" extraction, which takes O(1) time. 

Our schema contains over then 500 copyFields but only 70 of then are "indexed" 
fields. 
We also have one dynamic field with  a wildcard (*), which "catches" the rest 
of the document fields. 
As you can conclude, we have more than 400 copyFields that are based on this 
dynamicField but all, except one, are fixed (i.e. does not contain any 
wildcard).

>From some reason, the copyFields registration procedure defines those 400 
>fields as "DynamicCopyField " and then store them in the “dynamicCopyFields” 
>array, 
This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
justification: All of those 400 copyFields are not glob and therefore do not 
need any complex pattern matching to the input field. They all can be store at 
the "fixedCopyFields".
Only copyFields with asterisks need this "special" treatment and they are 
(especially on our case) pretty rare.  

Therefore, we created a patch which fix this problem by changing the 
registerCopyField() procedure.
Test which we conducted show that there is no change in the Indexing results. 
Moreover, the fix still successfully passes the class unit tests (i.e. 
IndexSchemaTest.java).

   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6132) UpdateRequest contains only XML ContentStream and not JavaBin

2014-06-03 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-6132:
--

Labels: UpdateProcessor Updating javabincodec solrj  (was: )

> UpdateRequest contains only XML ContentStream and not JavaBin 
> --
>
> Key: SOLR-6132
> URL: https://issues.apache.org/jira/browse/SOLR-6132
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java, update
>Reporter: Liram Vardi
>  Labels: UpdateProcessor, Updating, javabincodec, solrj
>
> When creating a UpdateRequest using the following code, I noted that even 
> though the request params include wt=javabin, the final request is being 
> translated to XML.
> I guess that this is because that the collection of ContentStreams that is 
> returned by UpdateRequest.getContentStreams() method contains only XML 
> ContentStream. 
>  
> Should not that UpdateRequest contain JavaBin ContentStream by default or 
> when it gets some parameter (such wt=javabin)?  
> The code:
>  UpdateRequest updateRequest = new UpdateRequest();
>  updateRequest.add(solrDocument);
>  updateRequest.setCommitWithin(-1);
>  SolrRequestParsers _parser = new SolrRequestParsers(null);
>  SolrQueryRequest req;
>  try {
>req = _parser.buildRequestFrom(targetCore, params,   
> {color:red}updateRequest.getContentStreams(){color});
>  } catch (Exception e) {
>   throw new SolrServerException(e);
>  }



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6132) UpdateRequest contains only XML ContentStream and not JavaBin

2014-06-03 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-6132:
--

Summary: UpdateRequest contains only XML ContentStream and not JavaBin   
(was: UpdateRequest only ContentStream is XML and not JavaBin )

> UpdateRequest contains only XML ContentStream and not JavaBin 
> --
>
> Key: SOLR-6132
> URL: https://issues.apache.org/jira/browse/SOLR-6132
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java, update
>Reporter: Liram Vardi
>
> When creating a UpdateRequest using the following code, I noted that even 
> though the request params include wt=javabin, the final request is being 
> translated to XML.
> I guess that this is because that the collection of ContentStreams that is 
> returned by UpdateRequest.getContentStreams() method contains only XML 
> ContentStream. 
>  
> Should not that UpdateRequest contain JavaBin ContentStream by default or 
> when it gets some parameter (such wt=javabin)?  
> The code:
>  UpdateRequest updateRequest = new UpdateRequest();
>  updateRequest.add(solrDocument);
>  updateRequest.setCommitWithin(-1);
>  SolrRequestParsers _parser = new SolrRequestParsers(null);
>  SolrQueryRequest req;
>  try {
>req = _parser.buildRequestFrom(targetCore, params,   
> {color:red}updateRequest.getContentStreams(){color});
>  } catch (Exception e) {
>   throw new SolrServerException(e);
>  }



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6132) UpdateRequest only ContentStream is XML and not JavaBin

2014-06-03 Thread Liram Vardi (JIRA)
Liram Vardi created SOLR-6132:
-

 Summary: UpdateRequest only ContentStream is XML and not JavaBin 
 Key: SOLR-6132
 URL: https://issues.apache.org/jira/browse/SOLR-6132
 Project: Solr
  Issue Type: Improvement
  Components: clients - java, update
Reporter: Liram Vardi


When creating a UpdateRequest using the following code, I noted that even 
though the request params include wt=javabin, the final request is being 
translated to XML.
I guess that this is because that the collection of ContentStreams that is 
returned by UpdateRequest.getContentStreams() method contains only XML 
ContentStream. 
 
Should not that UpdateRequest contain JavaBin ContentStream by default or when 
it gets some parameter (such wt=javabin)?  

The code:
 UpdateRequest updateRequest = new UpdateRequest();
 updateRequest.add(solrDocument);
 updateRequest.setCommitWithin(-1);
 SolrRequestParsers _parser = new SolrRequestParsers(null);
 SolrQueryRequest req;
 try {
   req = _parser.buildRequestFrom(targetCore, params,   
{color:red}updateRequest.getContentStreams(){color});
 } catch (Exception e) {
  throw new SolrServerException(e);
 }




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)

2014-05-25 Thread Liram Vardi (JIRA)
Liram Vardi created SOLR-6113:
-

 Summary: Edismax doesn't parse well the query uf (User Fields)
 Key: SOLR-6113
 URL: https://issues.apache.org/jira/browse/SOLR-6113
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Liram Vardi


It seems that Edismax User Fields feature does not behave as expected.
For instance, assuming the following query:
_"q= id:b* user:"Anna Collins"&defType=edismax&uf=* -user&rows=0"_
The parsed query (taken from query debug info) is:
_+((id:b* (text:user) (text:"anna collins"))~1)_

I expect that because "user" was filtered out in "uf" (User fields), the parsed 
query should not contain the "user" search part.
In another words, the parsed query should look simply like this:  _+id:b*_

This issue is affected by a the patch on issue SOLR-2649: When changing the 
default OP of Edismax to AND, the query results change.





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org