Re: Operators NEARx, BEFOR, AFTER, FIRSTx, LASTx

2015-03-17 Thread Petr Janský
Noone? :-(

Petr

Dne středa 18. února 2015 12:35:15 UTC+1 Petr Janský napsal(a):

 Hi Lukas,

 thank you for your answer. I checked the Proximity Match - 
 match_phrase and it's what I looking for. I'm only not able to find a way 
 how to create queries like:

1. Obama BEFORE Iraq - the first word(not term) is before the second 
in a field text
2. President Obama AFTER Iraq - the phrase President Obama is 
after Iraq in a field text

 In other words, the match_phrase doesn't have in_order parameter like 
 span_near and for span_near I have to use terms - have to run analyzer for 
 words befor.

 Do you have any idea how to implement these queries?

 Thanks
 Petr

 Dne pondělí 19. ledna 2015 10:23:21 UTC+1 Lukáš Vlček napsal(a):

 Hi Petr,

 let me try to address some of your questions:

 ad 1) I am not sure I understand what you mean. If you want to use span 
 type of query then simply use it instead of query string query. Especially, 
 if you pass user input into the query then it is recommended NOT to use 
 query string query and you should consider using different query type (like 
 span query in your case).

 ad 2) Not sure I fully understand but I can see match for some of those 
 requested features in span queries. Like slop. I would recommend you to 
 read through chapters of Proximity Matching [1] to see how you can use 
 slop.

 ad 3) The input that goes into span queries can go through text analysis 
 process (as long as I am not mistaken). The fact that there are term 
 queries behind the scene does not mean you can not process your analysis 
 first.

 May be if you can share some of your configs/documents/queries we can 
 help you more.

 [1] 
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/proximity-matching.html

 Regards,
 Lukas

 On Mon, Jan 19, 2015 at 10:02 AM, Petr Janský petr@6hats.cz wrote:

 Noone? :-(

 Petr

 Dne úterý 13. ledna 2015 15:37:18 UTC+1 Petr Janský napsal(a):

 Hi there,

 I'm looking for a way how to access span_near and span_first 
 functionality to users via search box in gui that uses query string query.

1. Is there any easy way how to do it?
2. Will ElasticSeach folks implement operators like NEARx, BEFOR, 
AFTER, FIRSTx, LASTx to be able search by (using query string):
   - specific max word distance between key words
   - order of key words
   - word position of key word in field from start and end of field 
   text
3. Span queries enable to use only terms, is there a way how to use 
words that will be analysed by lang. analyser - stemming etc.?


 Thanks
 Petr

  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/f90a0eba-1b61-4a23-a2af-ec6a0c5e461f%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/f90a0eba-1b61-4a23-a2af-ec6a0c5e461f%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b49027e5-949d-4e35-8907-80dec5137efe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using shingle

2015-03-17 Thread Petr Janský
Noone? :-(

Petr

Dne pátek 20. února 2015 15:29:15 UTC+1 Petr Janský napsal(a):

 Hi there,

 I've tried to use shingle for getting bigrams and trigrams

 curl -X POST 'localhost:9200/idnes/' -d '{
   settings : {
 analysis : {
   filter: {
 czech_stop: {
   type:   stop,
   stopwords:  _czech_,
   ignore_case: true,
   remove_trailing: false
 },
 czech_stop_ngram: {
   type:   stop,
   stopwords : [a, i, k, o, s, u, v, z, do, 
 co, by, do, je, mu, mi, mě, mně, mne, na, ne, ní, 
 si, se, ta, to, té, ti, ty, už, ve, za, že, aby, 
 ani, ale, byl, jak, jen, jde, kdo, kdy, kde, něm, 
 nich,  něj, než, pro, tak, ten, tam, tady, těch, jsou, 
 jsem, není, nyní, nimi, jako, jaká, jaké, jaká, právě, 
 který, která, které, jeho, její, nebo, jako, toho, kdyby, 
 takový, taková, takové, _czech_ ],
   ignore_case: true,
   remove_trailing: false
 },
 czech_keywords: {
   type:   keyword_marker,
   keywords:   [že] 
 },
 czech_stemmer: {
   type:   stemmer,
   language:   czech
 },
 shingle2_filter: {
 type: shingle,
 min_shingle_size: 2, 
 max_shingle_size: 2, 
 output_unigrams:  false   
 },
 shingle3_filter: {
 type: shingle,
 min_shingle_size: 3, 
 max_shingle_size: 3, 
 *output_unigrams:  false   *
 }
   },
   analyzer: {
 
 shingle2s_analyzer: {
 type: custom,
 tokenizer: standard,
 filter: [standard, lowercase, czech_stop_ngram, 
 shingle2_filter]
 },
 shingle3s_analyzer: {
 type: custom,
 tokenizer: standard,
 filter: [czech_stop_ngram, shingle3_filter ]
 }
   }
 }
  },

   mappings : {
 article : {
 _id : {
 path : reference
 },

 properties : {
 .
 content2   : { type:string, analyzer: shingle2_analyzer},
 content3   : { type:string, analyzer: shingle3_analyzer},
 content4   : { type:string, analyzer: 
 shingle2s_analyzer},
 content5   : { type:string, analyzer: 
 shingle3s_analyzer},
 ..

 If I try my analysers using by calling:

 curl -X GET 
 'localhost:9200/idnes/_analyze?analyzer=shingle3s_analyzerpretty' -d 'a e 
 i o u s k z na ke ze nad pod za před Norská strana zatím dostatečně 
 nevyhodnotila, jak citlivou otázkou je pro Česko případ synů Evy 
 Michalákové. Tak popisuje současnou situaci premiér Bohuslav Sobotka. Ten 
 již dostal odpověď na dopis od premiérky Norska Erny Solbergové. S obecnými 
 odpověďmi není spokojen a zvažuje do Norska další psaní.' | grep token

 It works fine. In results there are only trigrams
tokens : [ {
 token : _ e _,
 token : e _ _,
 token : _ _ Norská,
 token : _ Norská _,
 token : Norská _ zatím,
 token : _ zatím dostatečně,
 token : zatím dostatečně nevyhodnotila,
 token : dostatečně nevyhodnotila _,
 token : nevyhodnotila _ citlivou,
 token : _ citlivou otázkou,
 token : citlivou otázkou _,
 token : otázkou _ _,
 

 But there is an issue if I use it on indexed data
 POST idnes/_search?pretty=true 
 {
 query: {
 match: {
content_type: Article
 }
 }, 
 facets : {
 tag : {
 terms : {
 fields : [content5],
 size : 20
 }
 }
 }
 }

 In the response there are also unigrams.
facets: {
   tag: {
  _type: terms,
  missing: 452,
  total: 926077,
  other: 762645,
  terms: [
 {
term: a,
count: 18150
 },
 {
term: to,
count: 17131
 },
 {
term: je,
count: 14090
 },
 {
term: se,
count: 13621
 },
 {
term: na,
count: 12285
 },
 ..
 {
term: korun _ _,
count: 551
 },
 {
term: _ _ případě,
count: 499
 },
 {
term: zobrazení videa musíte,
count: 449
 }
 .


1. Why does it happen?
2. Is there any other way how to skip _ from stopword than 
http://www.elasticsearch.org/blog/searching-with-shingles/ that 
doesn't work for Lucene 4.4+?

 Thanks
 Petr



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch

Using shingle

2015-02-20 Thread Petr Janský
Hi there,

I've tried to use shingle for getting bigrams and trigrams

curl -X POST 'localhost:9200/idnes/' -d '{
  settings : {
analysis : {
  filter: {
czech_stop: {
  type:   stop,
  stopwords:  _czech_,
  ignore_case: true,
  remove_trailing: false
},
czech_stop_ngram: {
  type:   stop,
  stopwords : [a, i, k, o, s, u, v, z, do, 
co, by, do, je, mu, mi, mě, mně, mne, na, ne, ní, 
si, se, ta, to, té, ti, ty, už, ve, za, že, aby, 
ani, ale, byl, jak, jen, jde, kdo, kdy, kde, něm, 
nich,  něj, než, pro, tak, ten, tam, tady, těch, jsou, 
jsem, není, nyní, nimi, jako, jaká, jaké, jaká, právě, 
který, která, které, jeho, její, nebo, jako, toho, kdyby, 
takový, taková, takové, _czech_ ],
  ignore_case: true,
  remove_trailing: false
},
czech_keywords: {
  type:   keyword_marker,
  keywords:   [že] 
},
czech_stemmer: {
  type:   stemmer,
  language:   czech
},
shingle2_filter: {
type: shingle,
min_shingle_size: 2, 
max_shingle_size: 2, 
output_unigrams:  false   
},
shingle3_filter: {
type: shingle,
min_shingle_size: 3, 
max_shingle_size: 3, 
*output_unigrams:  false   *
}
  },
  analyzer: {

shingle2s_analyzer: {
type: custom,
tokenizer: standard,
filter: [standard, lowercase, czech_stop_ngram, 
shingle2_filter]
},
shingle3s_analyzer: {
type: custom,
tokenizer: standard,
filter: [czech_stop_ngram, shingle3_filter ]
}
  }
}
 },

  mappings : {
article : {
_id : {
path : reference
},

properties : {
.
content2   : { type:string, analyzer: shingle2_analyzer},
content3   : { type:string, analyzer: shingle3_analyzer},
content4   : { type:string, analyzer: shingle2s_analyzer},
content5   : { type:string, analyzer: shingle3s_analyzer},
..

If I try my analysers using by calling:

curl -X GET 
'localhost:9200/idnes/_analyze?analyzer=shingle3s_analyzerpretty' -d 'a e 
i o u s k z na ke ze nad pod za před Norská strana zatím dostatečně 
nevyhodnotila, jak citlivou otázkou je pro Česko případ synů Evy 
Michalákové. Tak popisuje současnou situaci premiér Bohuslav Sobotka. Ten 
již dostal odpověď na dopis od premiérky Norska Erny Solbergové. S obecnými 
odpověďmi není spokojen a zvažuje do Norska další psaní.' | grep token

It works fine. In results there are only trigrams
   tokens : [ {
token : _ e _,
token : e _ _,
token : _ _ Norská,
token : _ Norská _,
token : Norská _ zatím,
token : _ zatím dostatečně,
token : zatím dostatečně nevyhodnotila,
token : dostatečně nevyhodnotila _,
token : nevyhodnotila _ citlivou,
token : _ citlivou otázkou,
token : citlivou otázkou _,
token : otázkou _ _,


But there is an issue if I use it on indexed data
POST idnes/_search?pretty=true 
{
query: {
match: {
   content_type: Article
}
}, 
facets : {
tag : {
terms : {
fields : [content5],
size : 20
}
}
}
}

In the response there are also unigrams.
   facets: {
  tag: {
 _type: terms,
 missing: 452,
 total: 926077,
 other: 762645,
 terms: [
{
   term: a,
   count: 18150
},
{
   term: to,
   count: 17131
},
{
   term: je,
   count: 14090
},
{
   term: se,
   count: 13621
},
{
   term: na,
   count: 12285
},
..
{
   term: korun _ _,
   count: 551
},
{
   term: _ _ případě,
   count: 499
},
{
   term: zobrazení videa musíte,
   count: 449
}
.


   1. Why does it happen?
   2. Is there any other way how to skip _ from stopword than 
http://www.elasticsearch.org/blog/searching-with-shingles/ 
   that doesn't work for Lucene 4.4+?

Thanks
Petr

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0d2aa0fb-2a12-404d-bdf4-bb09b970cb5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Operators NEARx, BEFOR, AFTER, FIRSTx, LASTx

2015-02-18 Thread Petr Janský
Hi Lukas,

thank you for your answer. I checked the Proximity Match - match_phrase 
and it's what I looking for. I'm only not able to find a way how to create 
queries like:

   1. Obama BEFORE Iraq - the first word(not term) is before the second in 
   a field text
   2. President Obama AFTER Iraq - the phrase President Obama is after 
   Iraq in a field text
   
In other words, the match_phrase doesn't have in_order parameter like 
span_near and for span_near I have to use terms - have to run analyzer for 
words befor.

Do you have any idea how to implement these queries?

Thanks
Petr

Dne pondělí 19. ledna 2015 10:23:21 UTC+1 Lukáš Vlček napsal(a):

 Hi Petr,

 let me try to address some of your questions:

 ad 1) I am not sure I understand what you mean. If you want to use span 
 type of query then simply use it instead of query string query. Especially, 
 if you pass user input into the query then it is recommended NOT to use 
 query string query and you should consider using different query type (like 
 span query in your case).

 ad 2) Not sure I fully understand but I can see match for some of those 
 requested features in span queries. Like slop. I would recommend you to 
 read through chapters of Proximity Matching [1] to see how you can use 
 slop.

 ad 3) The input that goes into span queries can go through text analysis 
 process (as long as I am not mistaken). The fact that there are term 
 queries behind the scene does not mean you can not process your analysis 
 first.

 May be if you can share some of your configs/documents/queries we can help 
 you more.

 [1] 
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/proximity-matching.html

 Regards,
 Lukas

 On Mon, Jan 19, 2015 at 10:02 AM, Petr Janský petr@6hats.cz 
 javascript: wrote:

 Noone? :-(

 Petr

 Dne úterý 13. ledna 2015 15:37:18 UTC+1 Petr Janský napsal(a):

 Hi there,

 I'm looking for a way how to access span_near and span_first 
 functionality to users via search box in gui that uses query string query.

1. Is there any easy way how to do it?
2. Will ElasticSeach folks implement operators like NEARx, BEFOR, 
AFTER, FIRSTx, LASTx to be able search by (using query string):
   - specific max word distance between key words
   - order of key words
   - word position of key word in field from start and end of field 
   text
3. Span queries enable to use only terms, is there a way how to use 
words that will be analysed by lang. analyser - stemming etc.?


 Thanks
 Petr

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/f90a0eba-1b61-4a23-a2af-ec6a0c5e461f%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/f90a0eba-1b61-4a23-a2af-ec6a0c5e461f%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8e88fb60-0e1c-423e-8208-a5e01206c620%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Operators NEARx, BEFOR, AFTER, FIRSTx, LASTx

2015-01-19 Thread Petr Janský
Noone? :-(

Petr

Dne úterý 13. ledna 2015 15:37:18 UTC+1 Petr Janský napsal(a):

 Hi there,

 I'm looking for a way how to access span_near and span_first functionality 
 to users via search box in gui that uses query string query.

1. Is there any easy way how to do it?
2. Will ElasticSeach folks implement operators like NEARx, BEFOR, 
AFTER, FIRSTx, LASTx to be able search by (using query string):
   - specific max word distance between key words
   - order of key words
   - word position of key word in field from start and end of field 
   text
3. Span queries enable to use only terms, is there a way how to use 
words that will be analysed by lang. analyser - stemming etc.?


 Thanks
 Petr


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f90a0eba-1b61-4a23-a2af-ec6a0c5e461f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Operators NEARx, BEFOR, AFTER, FIRSTx, LASTx

2015-01-13 Thread Petr Janský
Hi there,

I'm looking for a way how to access span_near and span_first functionality 
to users via search box in gui that uses query string query.

   1. Is there any easy way how to do it?
   2. Will ElasticSeach folks implement operators like NEARx, BEFOR, AFTER, 
   FIRSTx, LASTx to be able search by (using query string):
  - specific max word distance between key words
  - order of key words
  - word position of key word in field from start and end of field text
   3. Span queries enable to use only terms, is there a way how to use 
   words that will be analysed by lang. analyser - stemming etc.?


Thanks
Petr

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/15d8acf6-83dd-4a11-bc7f-0377c2628035%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Get X word before and after search word

2014-06-17 Thread Petr Janský
Hello,

I'm trying to find way how to get words/terms around search word eg let's 
have a document with text The best search engine is ElasticSearch. I will 
search for best and get info that word search is xtime the next one 
after search words.

Thx
Petr

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a1b4896-6263-4de2-ad45-dc5efd4df7a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Query join cross types

2014-05-27 Thread Petr Janský
Hello guys,

I'm trying to find a way how to solve this case with two types:

product : {
properties : {
kbi_id : { type : string, index : not_analyzed},
agreement_status : { type : string, index : 
not_analyzed},   *- values Y/N*
product_code : { type : string, index : not_analyzed},
date_valid : { type : string, format : dd.MM.}
}}
~120M records


note : {
properties : {
note : { type : string, analyzer : cestina_hunspell},
kbi_id : { type : string, index : not_analyzed},
created_date : {type : date, format : dd.MM. HH:mm:ss},
}},
~3M records

The kbi_id field is clientID - join field. I don't have clients in my index.

I would like to get all note records for witch exist 1+ product record with 
agreement_status=Y and simultaneously 1+ product record with 
agreement_status=N.

Any ideas?

Thx
Petr

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4b217624-f542-4f7d-8493-fc7a93932e65%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch-carrot2 JAVA API call

2014-03-05 Thread Petr Janský
Hello,

I use 
elasticsearch-carrot2https://github.com/carrot2/elasticsearch-carrot2and I 
was not able to find how to get cluster result using JAVA API e.g. for

GET _search_with_clusters?pretty=true
{
  query : {
   bool: {
  must: [
  { term: { content: zeman}}
]
   }
  },
  query_hint: vodafone,
  field_mapping: {
url:  [_source.reference],
title:[_source.title],
content:  [_source.content]
  }
}


result

{
   took: 12,
   timed_out: false,
   _shards: {
  total: 49,
  successful: 49,
  failed: 0
   },
   hits: {
  total: 2080562,
  max_score: 1,
  hits: [.]
   },
   clusters: [
  {
 id: 0,
 score: 4.094963866819582,
 label: Možné Scénáře,
 phrases: [
Možné Scénáře
 ],
 documents: [

http://www.denik.cz/z_domova/palestinsky-velvyslanec-zemrel-v-nemocnici-na-misto-jede-policejni-sef-cervicek.html#2114206;,

http://www.denik.cz/z_domova/palestinsky-velvyslanec-zemrel-v-nemocnici-na-misto-jede-policejni-sef-cervicek.html#2114186;,

http://www.denik.cz/z_domova/palestinsky-velvyslanec-zemrel-v-nemocnici-na-misto-jede-policejni-sef-cervicek.html#2114174;
 ]
  },
 
   ],
   info: {
  algorithm: lingo,
  search-millis: 11,
  clustering-millis: 26,
  total-millis: 38,
  include-hits: true
   }
}


is there any way how call this query using JAVA API?

Thx
Petr

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7a23e5a5-15ed-40be-80d9-fa93eebc28fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: elasticsearch-carrot2 JAVA API call

2014-03-05 Thread Petr Janský
Hi Dawid,

thank you for your reply. I tried the sample you sent me
https://github.com/carrot2/elasticsearch-carrot2/blob/master/src/test/java/org/carrot2/elasticsearch/ClusteringActionTests.java#L65
 

but I got
Exception in thread main 
org.elasticsearch.common.util.concurrent.UncategorizedExecutionException: 
Failed execution
at 
org.elasticsearch.action.support.AdapterActionFuture.rethrowExecutionException(AdapterActionFuture.java:88)
at 
org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:49)
at 
es.search.ClusteringActionTests.testAttributes(ClusteringActionTests.java:85)
at es.search.ClusteringActionTests.main(ClusteringActionTests.java:61)
Caused by: java.lang.ClassCastException: java.lang.Boolean cannot be cast 
to java.util.Map
at 
org.elasticsearch.common.io.stream.StreamInput.readMap(StreamInput.java:319)
at 
org.carrot2.elasticsearch.ClusteringAction$ClusteringActionRequest.readFrom(ClusteringAction.java:407)
at 
org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:209)
at 
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109)
at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

I use
Elasticsearch 0.9.10
elasticsearch-carrot2 1.2.2
used jars from elasticsearch-carrot2-1.2.0.zip

thx
Petr

Dne středa, 5. března 2014 13:03:01 UTC+1 Dawid Weiss napsal(a):

 In short: yes. Perhaps the best way to check out how to use Java API 
 is to look at unit tests. For example: 


 https://github.com/carrot2/elasticsearch-carrot2/blob/master/src/test/java/org/carrot2/elasticsearch/ClusteringActionTests.java#L65
  

 I would also use a language code field and mapping for Czech; this 
 should improve your clustering results (especially if you tweak the 
 default lexical resources in Carrot2... patches welcome :). 


 https://github.com/carrot2/elasticsearch-carrot2/blob/master/src/test/java/org/carrot2/elasticsearch/ClusteringActionTests.java#L100
  

 Dawid 

 On Wed, Mar 5, 2014 at 10:53 AM, Petr Janský petr@6hats.czjavascript: 
 wrote: 
  Hello, 
  
  I use elasticsearch-carrot2 and I was not able to find how to get 
 cluster 
  result using JAVA API e.g. for 
  
  GET _search_with_clusters?pretty=true 
  { 
query : { 
 bool: { 
must: [ 
{ term: { content: zeman}} 
  ] 
 } 
}, 
query_hint: vodafone, 
field_mapping: { 
  url

Expanding terms

2014-02-25 Thread Petr Janský
Hello,

I'trying to find a way how to:

   1. expand a term - get all words and count that are relevant for a 
   term(s)
   2. get relevant words for a query - list of all words that 
   are highlighted
   3. get phrases by word - e.g. for word war = world war, second word 
   war, the second word war, 

and complicated one

   1. is there a way how to get/highlighted only words that are relevant 
   for multiple term conditions e.g. 

must: {
  wildcard: {
 content_morfo: {
   value: v*
  }
},
   wildcard: {
 content_morfo: {
   value: ==AA*==
  }
}
}

thx
Petr
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f2c0eeb5-8da3-4554-bdac-d6b5122f01c1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.