RE: Ignoring Duplicates in Multivalue Field

2014-11-03 Thread Tomer Levi
Hi Ahmet,
When I add the RunUpdateProcessorFactory Solr didn't remove any duplications.
Any other idea?


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Monday, November 03, 2014 1:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Ignoring Duplicates in Multivalue Field

Hi Tomer,

What happens when you addto your chain?

Ahmet



On Sunday, November 2, 2014 1:22 PM, Tomer Levi  wrote:



Hi,
I’m trying to make my “update” request handler ignore multivalue duplications 
in updates.
To make my use case clear, let’s assume my index already contains a document 
like:
{
   id:”100”, 
 “myMultValueField”: [“1”,”2”,”3”]
}

Later I would like to send an update like:
{
   id:”100”,” 
   myMultValueField” {“add”:”2”}
}

How can I make the update request handler understand that “2” already exist and 
ignore it?
I tried to add update chain below but it didn’t work for me.


   
 myMultValueField 
  
   

And add it to my requestHandler:
   
   
 uniq-fields
   
    

Tomer Levi 
Software Engineer  
Big Data Group 
Product & Technology Unit 
(T) +972 (9) 775-2693 

tomer.l...@nice.com  
www.nice.com   


Ignoring Duplicates in Multivalue Field

2014-11-02 Thread Tomer Levi
Hi,
I'm trying to make my "update" request handler ignore multivalue duplications 
in updates.
To make my use case clear, let's assume my index already contains a document 
like:
{
   id:"100",
 "myMultValueField": ["1","2","3"]
}

Later I would like to send an update like:
{
   id:"100","
   myMultValueField" {"add":"2"}
}

How can I make the update request handler understand that "2" already exist and 
ignore it?
I tried to add update chain below but it didn't work for me.


   
 myMultValueField 
  
   

And add it to my requestHandler:

   
 uniq-fields
   


Tomer Levi

Software Engineer
Big Data Group

Product & Technology Unit

(T) +972 (9) 775-2693



tomer.l...@nice.com<mailto:tomer.l...@nice.com>

www.nice.com<http://www.nice.com/>

[cid:image001.png@01CFF69B.BA456EB0]<http://twitter.com/NICE_Systems/>[cid:image002.png@01CFF69B.BA456EB0]<http://www.facebook.com/pages/NICE-Systems/149072782602/>[cid:image003.png@01CFF69B.BA456EB0]<http://www.linkedin.com/company/nice-systems>[cid:image004.png@01CFF69B.BA456EB0]<http://www.nice.com/blog>




[cid:image005.jpg@01CFF69B.BA456EB0]<http://www.nice.com/big-data-solutions>





RE: CopyField from text to multi value

2014-10-20 Thread Tomer Levi
Thanks Walter!

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Monday, October 20, 2014 12:09 AM
To: solr-user@lucene.apache.org
Subject: Re: CopyField from text to multi value

I think that info is available with termvectors. That should give a list of the 
query terms that matched each document, if I understand it correctly.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Oct 19, 2014, at 7:37 AM, Tomer Levi  wrote:

> Thanks again for the help.
> 
> 
> 
> The use case is this.
> 
> In my UI I would like to indicate which words leaded to every document in the 
> response.
> 
> It actually seems like a simple highlight case but instead of getting the 
> highlight result as "this is a long string with text",
> 
> Our UI team wants a list of words, i.e:["long", "with"].
> 
> 
> 
> So, I assumed that I can just tokenize the original text -> copy the tokens 
> into new multi-value fields -> ask Solr to highlight the multi-value field
> 
> 
> 
> That is my use case.
> 
> Thanks again
> 
> Tomer
> 
> 
> 
> 
> 
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, October 19, 2014 5:18 PM
> To: solr-user@lucene.apache.org
> Subject: Re: CopyField from text to multi value
> 
> 
> 
> This really feels like an  XY problem, which I think Jack is alluding to.
> 
> 
> 
> bq:  I understand that the analysis chain is applied after the raw input was 
> copied.
> 
> I need to store the output of the analysis chain as a new multi-value field
> 
> 
> 
> This statement is really confusing. You can't have the output of the analysis 
> chain used as input to a copyField, it just doesn't work that way which is 
> what you seem to want to do with the second sentence. Then you bring shingles 
> into the picture...
> 
> 
> 
> So let's take Jack's suggestion and  back up and tell us what the use-case 
> you're trying to support is rather than leaving us to guess what problem 
> you're trying to solve..
> 
> 
> 
> Best,
> 
> Erick
> 
> 
> 
> 
> 
> On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky 
> mailto:j...@basetechnology.com>> wrote:
> 
>> As always, you need to first examine how you intend to query the fields 
>> before you dive into data modeling. In this case, is there any particular 
>> reason that you need the individual terms as separate values, as opposed to 
>> simply using a tokenized text field?
> 
>> 
> 
>> -- Jack Krupansky
> 
>> 
> 
>> From: Tomer Levi
> 
>> Sent: Sunday, October 19, 2014 9:07 AM
> 
>> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> 
>> Subject: CopyField from text to multi value
> 
>> 
> 
>> Hi,
> 
>> 
> 
>> I would like to copy a textual field content into a multivalue filed.
> 
>> 
> 
>> For example,
> 
>> 
> 
>> Let's say my field text contains: "I am a solr user"
> 
>> 
> 
>> I would like to have a multi-value copyFields with the following
> 
>> content: ["I", "am", "a", "solr", "user"]
> 
>> 
> 
>> 
> 
>> 
> 
>> Thanks,
> 
>> 
> 
>>  Tomer Levi
> 
>> 
> 
>>  Software Engineer
> 
>> 
> 
>>  Big Data Group
> 
>> 
> 
>>  Product & Technology Unit
> 
>> 
> 
>>  (T) +972 (9) 775-2693
> 
>> 
> 
>> 
> 
>> 
> 
>>  tomer.l...@nice.com<mailto:tomer.l...@nice.com>
> 
>> 
> 
>>  www.nice.com<http://www.nice.com>
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 



RE: CopyField from text to multi value

2014-10-19 Thread Tomer Levi
Thanks again for the help.



The use case is this.

In my UI I would like to indicate which words leaded to every document in the 
response.

It actually seems like a simple highlight case but instead of getting the 
highlight result as "this is a long string with text",

Our UI team wants a list of words, i.e:["long", "with"].



So, I assumed that I can just tokenize the original text -> copy the tokens 
into new multi-value fields -> ask Solr to highlight the multi-value field



That is my use case.

Thanks again

Tomer





-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Sunday, October 19, 2014 5:18 PM
To: solr-user@lucene.apache.org
Subject: Re: CopyField from text to multi value



This really feels like an  XY problem, which I think Jack is alluding to.



bq:  I understand that the analysis chain is applied after the raw input was 
copied.

I need to store the output of the analysis chain as a new multi-value field



This statement is really confusing. You can't have the output of the analysis 
chain used as input to a copyField, it just doesn't work that way which is what 
you seem to want to do with the second sentence. Then you bring shingles into 
the picture...



So let's take Jack's suggestion and  back up and tell us what the use-case 
you're trying to support is rather than leaving us to guess what problem you're 
trying to solve..



Best,

Erick





On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky 
mailto:j...@basetechnology.com>> wrote:

> As always, you need to first examine how you intend to query the fields 
> before you dive into data modeling. In this case, is there any particular 
> reason that you need the individual terms as separate values, as opposed to 
> simply using a tokenized text field?

>

> -- Jack Krupansky

>

> From: Tomer Levi

> Sent: Sunday, October 19, 2014 9:07 AM

> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>

> Subject: CopyField from text to multi value

>

> Hi,

>

> I would like to copy a textual field content into a multivalue filed.

>

> For example,

>

> Let’s say my field text contains: “I am a solr user”

>

> I would like to have a multi-value copyFields with the following

> content: [“I”, “am”, “a”, “solr”, “user”]

>

>

>

> Thanks,

>

>   Tomer Levi

>

>   Software Engineer

>

>   Big Data Group

>

>   Product & Technology Unit

>

>   (T) +972 (9) 775-2693

>

>

>

>   tomer.l...@nice.com<mailto:tomer.l...@nice.com>

>

>   www.nice.com<http://www.nice.com>

>

>

>

>

>

>

>

>

>

>

>

>


RE: CopyField from text to multi value

2014-10-19 Thread Tomer Levi

Hi Erick,
Thanks for the explanation, I understand that the analysis chain is applied 
after the raw input was copied.
I need to store the output of the analysis chain as a new multi-value field, 
and I think that ShingleFilterFactory might do that, isn’t it?

Tomer

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, October 19, 2014 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: CopyField from text to multi value

Not quite sure what you're asking here. If you do a copyField, the raw input 
is, well, copied to the destination field and _then_ the analysis chain is 
applied. Which seems to be what you want, the destination field would be a 
text-based field, perhaps text_general or some such from the distro.

And perhaps there;s some confusion about what multiValued means here. It does 
_not_ mean "tokenized", i.e. broken up into words. non-multiValued fields can 
be tokenized.

multiValued means tha tmore than one entry for the field can be in a doc.
I.e. (using the XML form of an input doc as an example)


  
  some text
  and now for something completely different  
 

will succeed with a field defined as multiValued="true", but fail with 
something with multiValued="false".

In either case, though, whether the input was broken up into multiple, 
independently-searchable tokens (words) is orthogonal to whether it's 
multiValued or not, and is entirely dependent on the analysis chain in the 
 for the field in question.

Best,
Erick

On Sun, Oct 19, 2014 at 9:07 AM, Tomer Levi  wrote:

> Hi,
>
> I would like to copy a textual field content into a multivalue filed.
>
> For example,
>
> Let’s say my field text contains: *“I am a solr user”*
>
> I would like to have a multi-value copyFields with the following content*:
> [“I”, “am”, “a”, “solr”, “user”]*
>
>
>
> *Thanks,*
>
> *Tomer Levi*
>
> *Software Engineer  *
>
> *Big Data Group*
>
> *Product & Technology Unit*
>
> (T) +972 (9) 775-2693
>
>
>
> tomer.l...@nice.com
>
> www.nice.com
>
> [image: http://tlvbiztalk03/SignatureMaker/img/newsocial_03.png]
> <http://twitter.com/NICE_Systems/>[image:
> http://tlvbiztalk03/SignatureMaker/img/newsocial_04.png]
> <http://www.facebook.com/pages/NICE-Systems/149072782602/>[image:
> http://tlvbiztalk03/SignatureMaker/img/newsocial_05.png]
> <http://www.linkedin.com/company/nice-systems>[image:
> http://tlvbiztalk03/SignatureMaker/img/newsocial_06.png]
> <http://www.nice.com/blog>
>
>
>
> [image: http://tlvbiztalk03/SignatureMaker/img/banner_BIG-DATA.jpg]
> <http://www.nice.com/big-data-solutions>
>
>
>
>
>


CopyField from text to multi value

2014-10-19 Thread Tomer Levi
Hi,
I would like to copy a textual field content into a multivalue filed.
For example,
Let's say my field text contains: "I am a solr user"
I would like to have a multi-value copyFields with the following content: ["I", 
"am", "a", "solr", "user"]

Thanks,
Tomer Levi

Software Engineer
Big Data Group

Product & Technology Unit

(T) +972 (9) 775-2693



tomer.l...@nice.com<mailto:tomer.l...@nice.com>

www.nice.com<http://www.nice.com/>

[cid:image001.png@01CFEBB6.C9EC8550]<http://twitter.com/NICE_Systems/>[cid:image002.png@01CFEBB6.C9EC8550]<http://www.facebook.com/pages/NICE-Systems/149072782602/>[cid:image003.png@01CFEBB6.C9EC8550]<http://www.linkedin.com/company/nice-systems>[cid:image004.png@01CFEBB6.C9EC8550]<http://www.nice.com/blog>




[cid:image005.jpg@01CFEBB6.C9EC8550]<http://www.nice.com/big-data-solutions>





RE: multiple terms order in query - eDismax

2014-09-28 Thread Tomer Levi
Thanks Jack!
Do you have any idea how can I select documents according to the appearance 
order of the terms? 

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Sunday, September 28, 2014 1:27 PM
To: solr-user@lucene.apache.org
Subject: Re: multiple terms order in query - eDismax

pf and ps merely control boosting of documents, not selection of documents.

mm controls selection of documents.

So, hopefully at least doc3 is returned before doc2.

-- Jack Krupansky

From: Tomer Levi 
Sent: Sunday, September 28, 2014 5:39 AM
To: solr-user@lucene.apache.org 
Subject: multiple terms order in query - eDismax

Hi,

We have an index with 3 documents, each document contains a single field let’s 
call it ‘text’ (except the id) as below:

· Doc1 

o   text:home garden sky sea wolf

· Doc2 

o   text:home wolf sea garden sky

· Doc3 

o   text:wolf sea home garden sky 

 

When executing the query: home garden apple, 

Using eDismax params:

· pf=text 

· ps=1 

· mm=2 

We would like to get Doc1 and Doc3, in other words all the documents having at 
least 2 terms in close proximity (only 1 term off).

 

The problem is that we get all 3 documents, it looks like the ‘ps’ parameter 
doesn’t count. 

Why Doc2 included in the results?  We expected that Solr will emit it since the 
‘ps’ is larger than 1 => we have home wolf sea garden (ps=2?)

 

 

 

      Tomer Levi
 
  Software Engineer  

  Big Data Group
 
  Product & Technology Unit
 
  (T) +972 (9) 775-2693
 
   
 
  tomer.l...@nice.com 
 
  www.nice.com
 

 
 
   
 

 

 

 


multiple terms order in query - eDismax

2014-09-28 Thread Tomer Levi
Hi,
We have an index with 3 documents, each document contains a single field let's 
call it 'text' (except the id) as below:

* Doc1

o   text:home garden sky sea wolf

* Doc2

o   text:home wolf sea garden sky

* Doc3

o   text:wolf sea home garden sky

When executing the query: home garden apple,
Using eDismax params:

* pf=text

* ps=1

* mm=2
We would like to get Doc1 and Doc3, in other words all the documents having at 
least 2 terms in close proximity (only 1 term off).

The problem is that we get all 3 documents, it looks like the 'ps' parameter 
doesn't count.
Why Doc2 included in the results?  We expected that Solr will emit it since the 
'ps' is larger than 1 => we have home wolf sea garden (ps=2?)



Tomer Levi

Software Engineer
Big Data Group

Product & Technology Unit

(T) +972 (9) 775-2693



tomer.l...@nice.com<mailto:tomer.l...@nice.com>

www.nice.com<http://www.nice.com/>

[cid:image001.png@01CFDB18.EF9E9800]<http://twitter.com/NICE_Systems/>[cid:image002.png@01CFDB18.EF9E9800]<http://www.facebook.com/pages/NICE-Systems/149072782602/>[cid:image003.png@01CFDB18.EF9E9800]<http://www.linkedin.com/company/nice-systems>[cid:image004.png@01CFDB18.EF9E9800]<http://www.nice.com/blog>




[cid:image005.jpg@01CFDB18.EF9E9800]<http://www.nice.com/big-data-solutions>





RE: Indexing documents with ContentStreamUpdateRequest (SolrJ) asynchronously

2014-08-30 Thread Tomer Levi
Hi Jorge,

In my indexing code I've created the following Callable class:



public class IndexerThread implements Callable< UpdateResponse > {



private SolrServer solrServer;

private Collection documentsToIndex;

public IndexerThread(SolrServer solr, Collection 
documentsToIndex){

this.solrServer = solr;

this. documentsToIndex = documentsToIndex;

}



@Override

public UpdateResponse call() throws Exception {

return solr.add(documentsToIndex);

}

}





Then I used the code below to create threads:



SolrServer solrServer = createSolrServer();

List> threads = createThreads(solrServer, documentsToIndex);

ExecutorService executor = Executors.newFixedThreadPool(threads.size());

List> futures = executor.invokeAll(threads);



while (true){

int complete = 0;

for(Future result : futures){

if(result.isDone()) {

UpdateResponse  resp = result.get();

//do something with the response

}

}



if(complete == threads.size()) {

break;

}else Thread.sleep(1000);

}



Hope it helps,

Tomer



-Original Message-
From: Jorge Moreira [mailto:j.moreira...@gmail.com]
Sent: Thursday, August 28, 2014 11:50 AM
To: solr-user@lucene.apache.org
Subject: Indexing documents with ContentStreamUpdateRequest (SolrJ) 
asynchronously



I am using SolrJ API 4.8 to index rich documents to solr. But i want to index 
these documents asynchronously. The function that I made send documents 
synchronously but i don't know how to change it to make it asynchronously. Any 
idea?



Function:



public Boolean indexDocument(HttpSolrServer server, String PathFile, 
InputReader external) {



ContentStreamUpdateRequest up = new 
ContentStreamUpdateRequest("/update/extract");



try {

up.addFile(new File(PathFile), "text");

} catch (IOException e) {



Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);

return false;

}



up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);



try {

server.request(up);

} catch (SolrServerException e) {



Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);

return false;



} catch (IOException e) {



Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);

return false;

}

return true;

}



Solr server: version 4.8.


RE: Help Highlight Snippets Score

2014-08-26 Thread Tomer Levi
Thanks!

-Original Message-
From: Markus Klose [mailto:markus.kl...@shi-gmbh.com] 
Sent: Friday, August 22, 2014 10:16 AM
To: solr-user@lucene.apache.org
Subject: AW: Help Highlight Snippets Score

HI Tomer,

I guess you are looking for a different fragment builder.
There is one called ScoreOrderFragmentsBuilder which probably is not exact what 
you need, but at least it orders the snippets by a score. That should work for 
you

http://wiki.apache.org/solr/HighlightingParameters#hl.fragmentsBuilder


Viele Grüße aus Augsburg

Markus Klose

-Ursprüngliche Nachricht-
Von: Tomer Levi [mailto:tomer.l...@nice.com] 
Gesendet: Donnerstag, 21. August 2014 16:49
An: solr-user@lucene.apache.org
Betreff: Help Highlight Snippets Score

Hi,

I have a document with a textual field, I would like to sort the highlighted 
snippets by the number of term occurrences.

For instance, when I have the following snippets:

"Solr Solr Solr"

"Solr Solr"

"Solr Solr Solr Solr"



I would like to get them ordered as:

"Solr Solr Solr Solr"

"Solr Solr Solr"

"Solr Solr"



After I debug the Highlighter I saw that it uses the 
org.apache.lucene.search.highlight.QueryScorer, this scorer gives the same 
score to all the snippets.

* Is there any other Scorer I can use?

* How can I set different scorer in solrconfig.xml?



Thanks,

Tomer




Help Highlight Snippets Score

2014-08-21 Thread Tomer Levi
Hi,

I have a document with a textual field, I would like to sort the highlighted 
snippets by the number of term occurrences.

For instance, when I have the following snippets:

"Solr Solr Solr"

"Solr Solr"

"Solr Solr Solr Solr"



I would like to get them ordered as:

"Solr Solr Solr Solr"

"Solr Solr Solr"

"Solr Solr"



After I debug the Highlighter I saw that it uses the 
org.apache.lucene.search.highlight.QueryScorer, this scorer gives the same 
score to all the snippets.

* Is there any other Scorer I can use?

* How can I set different scorer in solrconfig.xml?



Thanks,

Tomer