Re: Classification with percolator

Binh Ly Wed, 22 Jan 2014 09:30:06 -0800

Arthur,

I am assuming that you will define a query/rule for each tag, so in your 
case yes, that would be the way to define the percolator queries.


Couple of things that you might want to be aware:

1) Percolation is CPU intensive
2) The lesser the queries you can percolate against, the better. So when 
you call the percolate API, see if you can also pass in a query criteria to 
limit the queries to percolate against.

On Wednesday, January 22, 2014 5:12:54 AM UTC-5, Arthur Denning wrote:
>
> Hey Binh, Thanks a lot and it is really nice to hear from someone with 
> practical experience on this. Is it correct  to say if I had a thousand 
> tags, I would need to make thousands of 
>
> curl -XPUT 'localhost:9200/my-index1/.percolator/tagname1' 
>
> to register each tags? In your implementation is there any pitfalls or 
> nice tricks that is worth noting?
>
>
>
>
> On Wednesday, January 22, 2014 8:27:03 AM UTC+8, Binh Ly wrote:
>>
>> Arthur,
>>
>> You should be able to use filters in your percolator queries so for 
>> example you can use a term/terms filter. Also, in ES 1.0 you can shard the 
>> percolator query index out so that percolation can distribute that load 
>> around for better scalability. The best way is to experiment with it: 
>> http://www.elasticsearch.org/downloads/1-0-0-RC1.
>>
>> I actually worked for a company that did content classification this way, 
>> and the percolator was a perfect fit for that use-case.
>>
>> On Tuesday, January 21, 2014 10:01:36 AM UTC-5, Arthur Denning wrote:
>>>
>>> I am considering using the percolator API to classify document, namely, 
>>> by posting query like "football", "art" to the percolator, and then when 
>>> adding new documents, percolator should return the right tags. My concerns 
>>> is, suppose there is thousands of tag to be identified in this way, would 
>>> it be a performance nightmare? Is there thousands of query that is 
>>> implicitly running behind the scene?
>>>
>>> And what would be the recommended way to tackle these kind of 
>>> classification problem in Elasticsearch?
>>>
>>> It seems that Lucene has a classification api. Is it already integrated 
>>> elsewhere in Elasticsearch? Is there any roadmap concerning its 
>>> implementation?
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b6707b03-734a-4518-a12d-0e34e09e01f7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Classification with percolator

Reply via email to