Re: How to update/delete indexed documents from ES index using mysql jdbc river

2014-08-07 Thread coder
Here goes my use case:

Table t1 --- >
id a   b
123 somestring somestring

Table t2  --->
id   c  d  e
123 someIntegerCount somebooleanValue someString
select * from t1,t2 where t1.id=t2.id and t2.c > 0 and t2.d = 1;

which gives some rows as:

id a  b c   
  d  e 
123  someString   someStringsomeIntegerCount  somebooleanValue   
someString
Now, In my use case the values of c and d fields in table t2 keeps changing 
frequently. So, I index only those rows for which c field > 0 (as count 
keeps changing) and d field = 1 (which means either enabled or disabled).

Now, First time indexing is done without any issues. Problem comes when I 
update these two fields and want ES to reindex the documents. Since there 
might be some documents for which earlier c field was 0 but now it is 
non-zero and similarly d field was 0 earlier but changed to 1. Now, I want 
ES ti reflect those changes.(I guess that what mongo river does and I 
expect mysql to work, automatic sync). Also, there will be few results 
which were coming earlier but not now. How to delete those docs from index ?

How can I accomplish this ?

I have tried to explain my problem in simplest manner by keeping things 
simple. Please ask questions if anything is not clear.

Thanks 

On Thursday, 7 August 2014 14:25:07 UTC+5:30, Jörg Prante wrote:
>
> Can you give a minimal example of a query with the rows, and what rows are 
> deleted then, so we can work through the issue?
>
> The fundamental problem is that deleted rows in SQL are no longer 
> available for creating deletion requests and so they can not be tracked 
> over time - once they are gone, they are gone. The problem is known as 
> "stale data". This can be solved either at a bigger scope (by using time 
> windowed indexes where older indexes can be dropped) or by an extra DB 
> mechanism to provide the IDs of the deleted docs after they are deleted 
> (maybe by trigger), so they can be selected by JDBC plugin with a "select 
> _optype, _id" construction. Note, at a certain size, deleting single docs 
> in ES is not efficient.
>
> To sync data between DB and ES, JDBC plugin is probably not smart enough 
> (it is impossible to implement app-specific logic in JDBC plugin). So you 
> should also consider to write a middleware app with specific logic that 
> controls the deletions in the DB and after that deletes docs in ES.
>
> Jörg
>
>
>
>
> On Thu, Aug 7, 2014 at 10:44 AM, coder > 
> wrote:
>
>> Hi,
>>
>> I'm using jdbc mysql river plugin 
>> https://github.com/jprante/elasticsearch-river-jdbc for creating ES 
>> index. I have been able to index my documents successfully but I'm facing 
>> issues in updating/deleting indexed documents. My jdbc river is used with a 
>> sql query that uses multiple joins on tables and return results. These 
>> results are then indexed in ES. My problem is if I update some tables they 
>> will affect the results of that join query which should be reflected in ES 
>> index but ES index is not updating/deleting sql results from that join 
>> query. I found few threads where people are facing similar issue.
>>
>>
>> http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-mysql-not-deleting-records
>>
>> I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more 
>> thread from author itself where he states that deletions are no longer 
>> supported. He tells two methods of tackling the issue. One is re indexing 
>> itself and second is using some sql queries to update/delete indexed 
>> documents.
>> https://github.com/jprante/elasticsearch-river-jdbc/issues/202
>>
>> Can anyone please tell me how can I tackle the issue. How can I 
>> update/delete already indexed documents. Can anyone please elaborate on the 
>> second method of updating/deleting indexed documents.
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/f1179d7d-4232--b0ee-e7ac383a4bfc%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/42039e84-86e3-4dff-baac-662174a5c1dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to update/delete indexed documents from ES index using mysql jdbc river

2014-08-07 Thread coder
Hi,

I'm using jdbc mysql river plugin 
https://github.com/jprante/elasticsearch-river-jdbc for creating ES index. 
I have been able to index my documents successfully but I'm facing issues 
in updating/deleting indexed documents. My jdbc river is used with a sql 
query that uses multiple joins on tables and return results. These results 
are then indexed in ES. My problem is if I update some tables they will 
affect the results of that join query which should be reflected in ES index 
but ES index is not updating/deleting sql results from that join query. I 
found few threads where people are facing similar issue.

http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-mysql-not-deleting-records

I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more 
thread from author itself where he states that deletions are no longer 
supported. He tells two methods of tackling the issue. One is re indexing 
itself and second is using some sql queries to update/delete indexed 
documents.
https://github.com/jprante/elasticsearch-river-jdbc/issues/202

Can anyone please tell me how can I tackle the issue. How can I 
update/delete already indexed documents. Can anyone please elaborate on the 
second method of updating/deleting indexed documents.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f1179d7d-4232--b0ee-e7ac383a4bfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to improve search performance in ES ?

2014-07-11 Thread coder
Hi Jörg,

I have seen these links. I'm using ngram tokenizer. Issue which I'm facing 
is slow response time. For that I need some suggestions, how can I improve 
it ? Is there anyway by which I can query in a better way ? Also, I'm using 
a match query in a field in one of my filters but I have read that term 
filters are more effective. Can you give me some insight how can I use term 
filter in this case even if the field on which I want to apply the filter 
is not present in all the documents.

Thanks

On Saturday, 12 July 2014 00:09:50 UTC+5:30, Jörg Prante wrote:
>
> For autocompletion, you should use the completion suggester 
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
>
> or edge ngram tokenizer
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html
>
> Jörg
>
>
> On Fri, Jul 11, 2014 at 8:11 PM, coder > 
> wrote:
>
>> Hi,
>>
>> I'm working on improving the search response of ES but not able to do 
>> anything. My scenario is something like this:
>>
>> I'm using 3 ES queries to get relevant results for my autocompleter.
>>
>> 1. A function score query with a match query  ( To get a correct match if 
>> user typed query is available in documents based on popularity)
>>
>> 2. A multi match query  (To handle those scenarios in which a user types 
>> some text which is present in different fields in a document since my 
>> documents are multi fields like name, address, city, state, country )
>>
>> 3. A query string (In order to ensure if I missed user query by the above 
>> type I'll be able to search using more powerful but less accurate query 
>> string)
>>
>> Along with all the 3 queries, I'm using 4 filters (clubbed using AND 
>> filter).
>>
>> My performance is really bad and I want to improve it along with 
>> delivering relevat results in my autocompleter.
>>
>> Can anyone help me how can I improve this ? Any way I can club the 
>> queries for better performance ? 
>>
>> I have read that I BOOL filters should be used instead of AND filter 
>> since they use bitset which are cached internally. I think this makes one 
>> improvement because if in the first query ES stores the information of 
>> filters in bitset, it can reuse it in other two queries. That will make the 
>> thigs a little fast but based on queries, I'm not able to do any 
>> improvement ?
>>
>> Is there any way by which I can combine match and multi-match queries ( 1 
>> and 2) into a single effective query.
>>
>> Also, in place of query_string should I use some other query for faster 
>> execution.
>>
>> Any suggestions are welcome. 
>> Thanks
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5d99495b-20ef-46b6-a069-365574fdc0a9%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/5d99495b-20ef-46b6-a069-365574fdc0a9%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/741a7bc5-ffd7-4ba7-9296-ff6fff8f559f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to improve search performance in ES ?

2014-07-11 Thread coder
Hi,

I'm working on improving the search response of ES but not able to do 
anything. My scenario is something like this:

I'm using 3 ES queries to get relevant results for my autocompleter.

1. A function score query with a match query  ( To get a correct match if 
user typed query is available in documents based on popularity)

2. A multi match query  (To handle those scenarios in which a user types 
some text which is present in different fields in a document since my 
documents are multi fields like name, address, city, state, country )

3. A query string (In order to ensure if I missed user query by the above 
type I'll be able to search using more powerful but less accurate query 
string)

Along with all the 3 queries, I'm using 4 filters (clubbed using AND 
filter).

My performance is really bad and I want to improve it along with delivering 
relevat results in my autocompleter.

Can anyone help me how can I improve this ? Any way I can club the queries 
for better performance ? 

I have read that I BOOL filters should be used instead of AND filter since 
they use bitset which are cached internally. I think this makes one 
improvement because if in the first query ES stores the information of 
filters in bitset, it can reuse it in other two queries. That will make the 
thigs a little fast but based on queries, I'm not able to do any 
improvement ?

Is there any way by which I can combine match and multi-match queries ( 1 
and 2) into a single effective query.

Also, in place of query_string should I use some other query for faster 
execution.

Any suggestions are welcome. 
Thanks


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d99495b-20ef-46b6-a069-365574fdc0a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to index documents without location field ?

2014-07-08 Thread coder
yes, I got it. You are right Ivan. I think I should omit the field 
altogether because that way it won't find that field and will not try to 
index it. I think that should work. I'll try it and will let you know if 
that works. 

But how can I make the use of that location field is also very important. 
Can you give some of your precious time to answer the second part of the 
question. It will be a great help for me.

Thanks

On Tuesday, 8 July 2014 23:29:49 UTC+5:30, Ivan Brusic wrote:
>
> I made a very important mistake in my first response. What I meant to say 
> is
>
> Can you simply index the document WITHOUT the field entirely?
>
> What it appears that you are doing is indexing a field with no value. One 
> solution would be to simply omit the field altogether. Elasticsearch/Lucene 
> is scheme-less, so you can index a document without a field compared to the 
> other documents. The mapper is only used on fields that exist. If this is 
> not the case, can you provide a sample document?
>
> -- 
> Ivan
>
>
> On Tue, Jul 8, 2014 at 10:52 AM, coder > 
> wrote:
>
>> Yes, I can index the documents which contain location field but not those 
>> documents which don't contain location field. It gives a parsing exception 
>> in that case and then stops importing documents. Is there anyway by which I 
>> can tell the ES that index location if it's present otherwise skip it ?
>>
>> Actually I have a mix of documents (some contains location and some are 
>> without location), now let's say,result of a user query is a document which 
>> don't contain location field, Now if I do a sorting this will put that 
>> document in the last/first position. In this case, it will work but not in 
>> other cases where actual result needs to be a location based document. So, 
>> sorting won't fix my issue. I don't know always whether to put location 
>> missing document on top/bottom position.
>>
>> For query Rescorer, I'm not able to understand how can I apply it to the 
>> top k results from the first query. How to rescore based on location ? 
>> On quick Googling I found this:
>>
>> "query": {
>>   "function_score": {
>> "functions": [
>>   { "gauss":  { "loc":   { "origin": "51,0", "scale": "5km" }}},
>> ]
>>   }
>> }
>>
>> But, this is something similar to sorting, right ? Also, there might be a 
>> case in which you gave some score based on location field to a document. 
>> Now another document exist without location field so in this case your 
>> default score will be 0 and chances are that the actual match was this 
>> document(without location) which should come on the top but due to 
>> rescoring its score has reduced and it will not appear on top. How to 
>> tackle this kind of problem ?
>>
>> What I need something like this: I should first query ES, then inspect 
>> the results and decide if I really need location based biasing based on the 
>> results. If results of a query is a document which don't contain location 
>> field, the I know location based biasing will badly affect results. So, I 
>> won't go for location based biasing but let's say I got a result which 
>> contains a location field, so I'll go for a location based biasing. 
>> Something like conditional biasing. Is there any way by which I can use the 
>> result of response again for manipulation because querying again will take 
>> a lot of time ?
>>
>> Accept my apology if I'm not making things a bit clearer. Please don't 
>> hesitate to ask further questions for clarity.
>> Thanks
>>
>>
>>
>>
>>
>>
>> On Tuesday, 8 July 2014 21:51:29 UTC+5:30, Ivan Brusic wrote:
>>
>>> In terms of the parsing exception, can you simply index the document 
>>> with the field entirely?
>>>
>>> As far as sorting goes, it makes sense to push the location-less 
>>> documents to the top or bottom. You lost me on the part regarding the 
>>> rescorer. Do you need the location-less documents to be returned in your 
>>> query?
>>>
>>> -- 
>>> Ivan
>>>
>>>
>>> On Tue, Jul 8, 2014 at 9:16 AM, coder  wrote:
>>>
>>>> HI,
>>>>
>>>> I need to index a mix of documents, some of which needs to be indexed 
>>>> using geo_point with a location fields but there are some other documents 
>>>> which don't contain location field. Whenever I do indexi

Re: How to index documents without location field ?

2014-07-08 Thread coder
Yes, I can index the documents which contain location field but not those 
documents which don't contain location field. It gives a parsing exception 
in that case and then stops importing documents. Is there anyway by which I 
can tell the ES that index location if it's present otherwise skip it ?

Actually I have a mix of documents (some contains location and some are 
without location), now let's say,result of a user query is a document which 
don't contain location field, Now if I do a sorting this will put that 
document in the last/first position. In this case, it will work but not in 
other cases where actual result needs to be a location based document. So, 
sorting won't fix my issue. I don't know always whether to put location 
missing document on top/bottom position.

For query Rescorer, I'm not able to understand how can I apply it to the 
top k results from the first query. How to rescore based on location ? 
On quick Googling I found this:

"query": {
  "function_score": {
"functions": [
  { "gauss":  { "loc":   { "origin": "51,0", "scale": "5km" }}},
]
  }
}

But, this is something similar to sorting, right ? Also, there might be a 
case in which you gave some score based on location field to a document. 
Now another document exist without location field so in this case your 
default score will be 0 and chances are that the actual match was this 
document(without location) which should come on the top but due to 
rescoring its score has reduced and it will not appear on top. How to 
tackle this kind of problem ?

What I need something like this: I should first query ES, then inspect the 
results and decide if I really need location based biasing based on the 
results. If results of a query is a document which don't contain location 
field, the I know location based biasing will badly affect results. So, I 
won't go for location based biasing but let's say I got a result which 
contains a location field, so I'll go for a location based biasing. 
Something like conditional biasing. Is there any way by which I can use the 
result of response again for manipulation because querying again will take 
a lot of time ?

Accept my apology if I'm not making things a bit clearer. Please don't 
hesitate to ask further questions for clarity.
Thanks






On Tuesday, 8 July 2014 21:51:29 UTC+5:30, Ivan Brusic wrote:
>
> In terms of the parsing exception, can you simply index the document with 
> the field entirely?
>
> As far as sorting goes, it makes sense to push the location-less documents 
> to the top or bottom. You lost me on the part regarding the rescorer. Do 
> you need the location-less documents to be returned in your query?
>
> -- 
> Ivan
>
>
> On Tue, Jul 8, 2014 at 9:16 AM, coder > 
> wrote:
>
>> HI,
>>
>> I need to index a mix of documents, some of which needs to be indexed 
>> using geo_point with a location fields but there are some other documents 
>> which don't contain location field. Whenever I do indexing, I keep getting 
>> Mapper parsing exception with location={} during indexing and elasticsearch 
>> stops importing documents. Can anyone give me any idea how to work around 
>> the problem ? 
>>
>> Also, My requirement needs me to give search results based on location 
>> based biasing. I have read 
>> http://www.elasticsearch.org/blog/geo-location-and-search/ but not able 
>> to fit it in my usecase. Since I have a mix of documents, I can't simply 
>> sort on the basis of geo distance since it will put documents without 
>> location field either at top or at bottom while sorting. Can anyone tell me 
>> how can I apply sorting only on top k results. I have read about Query 
>> rescorer but I can't use it. I need to apply location based sorting only 
>> when a certain documents are returned. Based on that I need to make a 
>> decision whether to apply sorting or not. If I'll use rescorer it will 
>> rescore documents everytime. Is there anyway by which I can incorporate 
>> weight of location based score in normal score. Default can 0 and if some 
>> score x comes, use F(x). My problem is getting a proper formula for the 
>> case and how to use it in a query ?
>>
>> Looking forward for your help !!!
>>
>> Thanks,
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups

How to index documents without location field ?

2014-07-08 Thread coder
HI,

I need to index a mix of documents, some of which needs to be indexed using 
geo_point with a location fields but there are some other documents which 
don't contain location field. Whenever I do indexing, I keep getting Mapper 
parsing exception with location={} during indexing and elasticsearch stops 
importing documents. Can anyone give me any idea how to work around the 
problem ? 

Also, My requirement needs me to give search results based on location 
based biasing. I have 
read http://www.elasticsearch.org/blog/geo-location-and-search/ but not 
able to fit it in my usecase. Since I have a mix of documents, I can't 
simply sort on the basis of geo distance since it will put documents 
without location field either at top or at bottom while sorting. Can anyone 
tell me how can I apply sorting only on top k results. I have read about 
Query rescorer but I can't use it. I need to apply location based sorting 
only when a certain documents are returned. Based on that I need to make a 
decision whether to apply sorting or not. If I'll use rescorer it will 
rescore documents everytime. Is there anyway by which I can incorporate 
weight of location based score in normal score. Default can 0 and if some 
score x comes, use F(x). My problem is getting a proper formula for the 
case and how to use it in a query ?

Looking forward for your help !!!

Thanks,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d38363da-f291-4b20-a403-a0fc5e729e18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch 1.1.0 is slower compared to ES 0.90.3 ?

2014-06-09 Thread coder
Are you sure about jdk 1.7 ? My guess is one should use jdk 1.7 with ES 
1.2.0. Also, earlier with ES 0.90.3 it was working fine. How can jdk 
version affect the search query time ?

On Tuesday, 10 June 2014 03:56:53 UTC+5:30, Mark Walkom wrote:
>
> You should *not*be using < JDK 1.7.
> Upgrade to at least 1.7u25, preferably 1.7u55, and Oracle not OpenJDK.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 10 June 2014 02:34, coder > wrote:
>
>> Hi,
>>
>> I was using ES 0.90.3 in production earlier but after moving to ES 1.1.0, 
>> it is taking a longer time for a search query to fetch results ? I'm using 
>> same mapping and same search query. The only difference is that I'm using 
>> jdk 1.6. Should I migrate to 1.7 ? Also, how does sharding and replica 
>> affect searching time ? Can anyone please tell me the major factors which 
>> affect the search time ?
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/15b09071-b410-45cf-b7a9-db181e8d95fa%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/15b09071-b410-45cf-b7a9-db181e8d95fa%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9080ebf2-415a-4367-9313-4970368518e1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch 1.1.0 is slower compared to ES 0.90.3 ?

2014-06-09 Thread coder
Hi,

I was using ES 0.90.3 in production earlier but after moving to ES 1.1.0, 
it is taking a longer time for a search query to fetch results ? I'm using 
same mapping and same search query. The only difference is that I'm using 
jdk 1.6. Should I migrate to 1.7 ? Also, how does sharding and replica 
affect searching time ? Can anyone please tell me the major factors which 
affect the search time ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/15b09071-b410-45cf-b7a9-db181e8d95fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How does ES multi match query with a type cross_field do analsis of given string ?

2014-05-31 Thread coder
Hi,

I was reading ES docs on multi-match query with a type cross_field but a 
bit confused on how it does analysis of given input string. 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-cross-fields

The cross_field type tries to solve these problems at query time by taking 
a *term-centric*approach. It first analyzes the query string into 
individual terms, then looks for each term in any of the fields, as though 
they were one big field.

So when it analyzes the query into individual terms, which analyzer and 
filter dows it use ? Can we configure it with a specific analyzer and 
filter at index time on the fields in mapping ? 

I think we can now use multi-match as a replacement for query string ? 
Earlier multi-match will match a document only if the query string after 
analysis is present entirely in one field but now we can search for a 
document where a document consists of all the query terms, no matter if it 
is present in one field or a combination of fields. Am I correct ?

Thanks


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a073024d-0f61-4e1c-a6fb-c1be1435b70b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need suggestions on type of query to be used for a given analysis for better results?

2014-05-31 Thread coder
I'll be thankful if someone can give me idea about this.

Thanks

On Saturday, 31 May 2014 00:45:21 UTC+5:30, coder wrote:
>
> Hi, 
>
> I'm using following analyzers for indexing my documents in ES:
>
> "analysis" : {
>"analyzer" : {
>   "str_search_analyzer" : {
>   "tokenizer" : "standard",
>   "filter" : ["lowercase","asciifolding"]
>},
>"str_index_analyzer" : {
>  "tokenizer" : "standard",
>  "filter" : ["lowercase","asciifolding","edgengram"]
>  }
>},
>"filter" : {
>   "edgengram" : {
>   "type" : "edgeNGram",
>   "min_gram" : 3,
>   "max_gram" : 20,
>   "side" : "front"
>   }
>   }
>   }
>
> I'm sure the search and index analyzers can serve my pupose well but 
> querying documents in a right manner is also necessary for better results. 
> I have read different queries which have been provided by ES but confused 
> on which query or a combination of queries can work well with my use case.
>
> Let's say I have a document which contains 3 fields:
> city_name: Palo Alto 
> state_name: California
> country: United States
>
> Now, my index analyzer will create following tokens on these 3 fields:
> city_name: pal, palo, alt, alto
> state_name: cal, cali, calif, califo, califor, californ, californi, 
> california
> country: uni, unit, unite, united, sta, stat, state, states
>
> And user search for a word like: palo alt
> now, my search analyzer will index it like: palo, alt
>
> Now, I want to return only those documents which contains both these 
> tokens, either in same field(like state, city or country) or as a 
> combination of 2 fields. (Not those documents where either palo or pal or 
> alt are present. )
>
> Now which query can give me better results with these types of indexing 
> and searching ?
>
> I read about terms query but that works on not anlyzed fields. Also, 
> querystring will generate some inbuilt regex queries for searching (I don't 
> want that) I want only those documents where all the tokens of a user 
> searched query are present in either same fields or multiple fields within 
> same document. How can I achieve this ?
>
> Any idea ?
>
> Thanks
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5764f7c8-88ee-4d17-8f18-c9df346a2e3f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Need suggestions on type of query to be used for a given analysis for better results?

2014-05-30 Thread coder
Hi, 

I'm using following analyzers for indexing my documents in ES:

"analysis" : {
   "analyzer" : {
  "str_search_analyzer" : {
  "tokenizer" : "standard",
  "filter" : ["lowercase","asciifolding"]
   },
   "str_index_analyzer" : {
 "tokenizer" : "standard",
 "filter" : ["lowercase","asciifolding","edgengram"]
 }
   },
   "filter" : {
  "edgengram" : {
  "type" : "edgeNGram",
  "min_gram" : 3,
  "max_gram" : 20,
  "side" : "front"
  }
  }
  }

I'm sure the search and index analyzers can serve my pupose well but 
querying documents in a right manner is also necessary for better results. 
I have read different queries which have been provided by ES but confused 
on which query or a combination of queries can work well with my use case.

Let's say I have a document which contains 3 fields:
city_name: Palo Alto 
state_name: California
country: United States

Now, my index analyzer will create following tokens on these 3 fields:
city_name: pal, palo, alt, alto
state_name: cal, cali, calif, califo, califor, californ, californi, 
california
country: uni, unit, unite, united, sta, stat, state, states

And user search for a word like: palo alt
now, my search analyzer will index it like: palo, alt

Now, I want to return only those documents which contains both these 
tokens, either in same field(like state, city or country) or as a 
combination of 2 fields. (Not those documents where either palo or pal or 
alt are present. )

Now which query can give me better results with these types of indexing and 
searching ?

I read about terms query but that works on not anlyzed fields. Also, 
querystring will generate some inbuilt regex queries for searching (I don't 
want that) I want only those documents where all the tokens of a user 
searched query are present in either same fields or multiple fields within 
same document. How can I achieve this ?

Any idea ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5e290988-db9f-4ba3-8273-f4172cd3ca3e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch 1.1.0 equivalent to field query filter

2014-05-21 Thread coder
Just one confusion. let's say I want to query for a field which contains 
boolean value either true or false. So, writing query string with field = 
"true" won't search for that particular documents containing true for that 
boolean field. How can I achive this ? String "true" is different from 
boolean true, right ?

On Thursday, 22 May 2014 03:57:05 UTC+5:30, Ivan Brusic wrote:
>
> The field query is basically the query string query with the field 
> explicitly defined:
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-field-query.html
>
> The best resources for the Java API is to look at the unit tests and the 
> equivalent REST API classes. For example, RestSearchAction, 
>  RestIndexAction, RestClusterHealthAction, etc.
>
> Cheers,
>
> Ivan
>
>
> On Wed, May 21, 2014 at 10:46 AM, coder  >wrote:
>
>> Hi,
>>
>> I recently migrated from ES 0.90.3 to ES 1.1.0 but facing few issues 
>> while migration.
>>
>> 1. I was earlier using a field query filter based on the value of a field 
>> which is present in my document. Now, in ES 1.1.0 there is no such filter. 
>> Can anyone tell me what equivalent filter can be used. For eg, let's say I 
>> have a boolean field FIELD in my document and I want to put a filter based 
>> on the value of documents. So, I want only those documents for which this 
>> is TRUE. How can I achieve this in ES 1.1.0. 
>>
>> 2. I searched on forums about this issue. At one place I found field 
>> query is effectively same as query_string. I'm not getting this concept 
>> completely. Can anyone please explain this with some example.
>>
>> 3. Also, I'm exploring what all features are added in this new version 
>> but not able to find any proper java documentation. I found one link but 
>> that also is not complete. Any pointer to good links. Let's say I want to 
>> explore some particular query like function score, how can I find what all 
>> methods are there equivalent to query dsl in java implementation.
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/b789f3ed-645e-4dc0-ac98-d5b329c7b845%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/b789f3ed-645e-4dc0-ac98-d5b329c7b845%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1d9c60b9-5504-4171-a2c2-87f0c816828f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch 1.1.0 equivalent to field query filter

2014-05-21 Thread coder
Hi,

I recently migrated from ES 0.90.3 to ES 1.1.0 but facing few issues while 
migration.

1. I was earlier using a field query filter based on the value of a field 
which is present in my document. Now, in ES 1.1.0 there is no such filter. 
Can anyone tell me what equivalent filter can be used. For eg, let's say I 
have a boolean field FIELD in my document and I want to put a filter based 
on the value of documents. So, I want only those documents for which this 
is TRUE. How can I achieve this in ES 1.1.0. 

2. I searched on forums about this issue. At one place I found field query 
is effectively same as query_string. I'm not getting this concept 
completely. Can anyone please explain this with some example.

3. Also, I'm exploring what all features are added in this new version but 
not able to find any proper java documentation. I found one link but that 
also is not complete. Any pointer to good links. Let's say I want to 
explore some particular query like function score, how can I find what all 
methods are there equivalent to query dsl in java implementation.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b789f3ed-645e-4dc0-ac98-d5b329c7b845%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to create ES index with a mapping and data from Mysql ?

2014-05-17 Thread coder
Hi,

I'm using ES for creating an index with a mapping and data from Mysql. I'm 
able to create an empty idex along with mappings and settings but I'm not 
able to import data from Mysql into that index. I'm using 
https://github.com/jprante/elasticsearch-river-jdbc this plugin with ES 
1.1.0 There is no problem in version. The example shows something like this:

curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "",
"password" : "",
"sql" : "select * from orders"
}
}'

But how can I change it for some index INDEX and type TYPE. 

curl -XPUT "localhost:9200/_river/INDEX/_meta" -d '
{
"type": "jdbc",
"jdbc": {
"url": "jdbc:mysql://localhost:3306/my_db",
"driver": "com.mysql.jdbc.Driver",
"user" : "root",
"password": "root",
"sql" : "select * from my_table"
},
"index": {
"index" : "INDEX",
"type" : "TYPE"
}
}'

But this is not working for me ? 

Please help me.

Thanks


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/209f2dc4-d566-44d8-8413-079578a1d4ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch Version problem

2014-05-16 Thread coder
Hi, 

I need to index documents from mysql to ES. I read about jdbc river but it 
is compatible for ES 1.1.0. My problem is I'm using mongodb also for 
indexing some documents and that too with ES version 0.90.3. I know it's 
bit of a older version but that why I want to upgrade to newer version but 
my problem is I'm not able to find any suitable mongo river plugin for ES 
1.1.0. Can anyone give me pointer to where I can get river mongo plugin for 
ES 1.1.0 or jdbc river plugin for ES 0.90.3.

Thanks,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/feff43d5-e0fe-400c-b9d1-40ebcc535422%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


query_string bug in Elasticsearch-0.90.3, please tell me if it really is a bug ?

2014-02-04 Thread coder
I started using explain api for query_string but I guess in process I found 
a bug (don't know if it really is a bug or intended behaviour of 
query_string). This is going to be a long post, please be patient with me.

I'm using a doc:{name:"new delhi to goa",st:"goa"}
On using analyzer api for indexing I got these tokens:

{
  "tokens" : [ {
"token" : "new",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
  }, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi t",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi t",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to g",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to go",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to goa",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "del",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
  }, {
"token" : "delh",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
  }, {
"token" : "delhi",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
  }, {
"token" : "del",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "delh",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "delhi",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "delhi ",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : 

What is the meaning of field^4 from scoring perspective ?

2014-02-03 Thread coder
Hi,

I was going through the documentation of ES where they have mentioned that 
we can use boost for a certain field.

Boosting

Use the *boost* operator ^ to make one term more relevant than another. For 
instance, if we want to find all documents about foxes, but we are 
especially interested in quick foxes:

quick^2 fox

The default boost value is 1, but can be any positive floating point 
number. Boosts between 0 and 1 reduce relevance.

Boosts can also be applied to phrases or to groups:

"john smith"^2   (foo bar)^4

My doubt is what is the meaning of ^2 means is it like multiplying by a factor 
of 2 or multiplying the whole document score by 2 ?

Thanks


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/200b5523-8d4c-4b53-8333-b3a452954d32%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


What is the difference between query_string and multi-match for querying docs ?

2014-02-03 Thread coder
Hi,

Can anyone please tell me a detailed difference between these two queries. 
I studied the documentation but not able to figure out the difference 
between two. Can anyone please explain it with some examples in a more 
detailed fashion. I expect query string to give me docs which matches 
maximum number of terms which are generated by search_analyzer to indexed 
docs but it is not happening that way. 

Please help !!!

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fbc68c13-bcb9-40cb-9726-becbef14f278%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Need help with ES Query

2014-02-02 Thread coder
Hi,

I'm using an analyzer which includes a standard tokenizer and 
lowercase,asciifolding,suggestions_shingle and edgengrams as token filters 
in it. The analyzer is same for both indexing and searching. So, for a text 
like delhi to goa will be analyzed like:

{
  "tokens" : [ {
"token" : "de",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 1
  }, {
"token" : "del",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 1
  }, {
"token" : "delh",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 1
  }, {
"token" : "de",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
  }, {
"token" : "del",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
  }, {
"token" : "delh",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi ",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi t",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi to",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
  }, {
"token" : "de",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "del",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "delh",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi t",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi to",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi to ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi to g",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi to go",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "delhi to goa",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "to",
"start_offset" : 6,
"end_offset" : 8,
"type" : "word",
"position" : 2
  }, {
"token" : "to",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "to ",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "to g",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "to go",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "to goa",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "go",
"start_offset" : 9,
"end_offset" : 12,
"type" : "word",
"position" : 3
  }, {
"token" : "goa",
"start_offset" : 9,
"end_offset" : 12,
"type" : "word",
"position" : 3
  } ]
}

Now, the problem which I'm facing is while querying for "delhi t" I'm not 
getting documents which contains maximum matches for the analyzed tokens of 
"delhi t" on the top:
Instead I get docs which contain only "delhi" on the top.

I think ES find docs which have maximum match for a certain analyzed search 
field which is not happening over here. Can anyone please tell me why is it not 
working ? IS there any other query type like match or boolean query which I 
need to use ?

Any help will be appreciated.

Thanks


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f5e3eabb-a9de-4d20-9a2e-29bed3c6bed4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Which query type gives me docs which contains maximum number of user typed terms ?

2014-02-01 Thread coder
Hi,

I want to use a query type which gives me docs with maximum number of 
matching terms with user typed terms. I'm using standard tokenizer along 
with lowercase,asciifolding,suggestions_shingle and edgengrams token 
filters for both index and search analyzer. Right now I'm using custom 
score query but it is not giving me terms which contains exact/maximum 
number of user typed terms. for eg,

If I search for "delhi t", it gives me:

delhi
delhi 6
delhi to new york  -> after analysis this contains more matching terms
delhi 7
delhi to hyderabad > same case here

But in my docs collections, I have many delhi to  docs, so 
ideally it should give me those docs which contains "delhi t" as one of the 
terms since there is exact match.

Is there any way by which I can change my custom score query to achieve 
this ?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3a86ef5-a8a5-4d29-93e9-a202cd20ee76%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: What are good combinations of search analyzer, index analyzer and query for implementing an effective autocompleter using ElasticSearch ?

2014-02-01 Thread coder
Jorg

The second link is not working. Completion suggester is a good thing but it 
is restricted to prefix queries only I guess. You will have to give every 
possible combination of a user typed query for a document to be matched. 
Please correct me if I;m wrong

Thanks

On Sunday, 2 February 2014 00:03:09 UTC+5:30, Jörg Prante wrote:
>
> There is massive effort to implement autosuggest completion in most 
> convenient ways. 
>
> Since 0.90.3, there is the Lucene suggester implemented in ES 
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
>
> The Lucene FST is faster and more compact than n-grams and may serve most 
> use cases well.
>
> But there is no general solution to autocomplete, like for search in 
> general. It depends on the words in the index and how to search them. E.g. 
> for german language, you probably need extra analysis for normalization 
> forms, like decompounding and baseform reduction, to better support what 
> the user wants.
>
> If you look at (older) solutions that do not use Lucene FST, you can use 
> edgeNgram, a linguistic method that takes considerably more space. A demo 
> is here
>
>
> http://jprante.github.io/applications/2012/08/17/Autocompletion-with-jQuery-JAX-RS-and-Elasticsearch.htm
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/720c3be5-f5cf-4385-a6ba-da1263442290%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


What are good combinations of search analyzer, index analyzer and query for implementing an effective autocompleter using ElasticSearch ?

2014-02-01 Thread coder
Hi,

Elasticsearch is a powerful tool, no doubt but sometimes it can really make 
you cry when you are not able to find out a good combination of index and 
search analyzers along with a good query type for implementing an 
autocompleter. I have read and searched on Internet that this combination 
differs from one use case to another and only way to find out this is to 
try and test it yourself. But can't we generalize this use case to some of 
the extent ?

I'll take a base case where a document contains some title as a string and 
some description as a string. Mostly people implement autocompleter around 
such docs only. And a basic expectation of autocompleter is to find most 
appropriate document corresponding to a user query. A good autocompleter 
gives docs which exactly matches the user query but since user query can 
vary a lot from the actual content, the best autocompleter can do is to 
return docs which contains the maximum user typed terms and that thing is 
accomplished by a good query mechanism. So, can't we generalize a good 
combination for this base case. After that people can just extend that base 
case for other parameters of their docs.

I think people who have spent time with ElasticSearch are aware of pros and 
cons of almost every possible combination of these things. So, it can be a 
good way to start a thread where people can actually share their thoughts, 
experiences and suggestions on different possible combination of analyzers 
and query types so that beginners don't have to struggle a lot initially 
with using ElasticSearch.

I'll start with sharing mine combination (obviously it is not the best one 
but still working on it to improve the effectiveness of my autocompleter):

I have used standard tokenizer along with token filters: 
lowercase,asciifolding,suggestion_shingle,edgengrams (front). I have used 
same analyzers for both searching and indexing. For the query type, I'm 
using custom score query but somehow the results are not that 
effective/tuned. I expect my autocompleter to give documents which contains 
the maximum matching terms from a user typed query but it's not giving 
results that way.I'm still working on fine tuning it.

I think the above combination solves the problem to a certain extent but 
still there are a hell lot of other ways to go about it which I'm not aware 
of.

I request you people to please give some suggestions, views and share your 
personal experiences of going around this particular problem. 

Thanks 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/81b589c2-b1a1-4f8e-8b3a-8e377e864123%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


What will be the equivalent query of the following java api ?

2014-01-31 Thread coder
Hi,

I have following java code which query for search results.

String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0 ? 1 
: doc['po'].value)";
QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
QueryBuilders.queryString(query)
.field("text",30)
.field("ad")
.field("st")
.field("cn")
.field("co")

.defaultOperator(Operator.AND)).script(script);

Can anyone tell me what will be the equivalent query dsl along with the 
explain api ?

I tried using this but it is not working:

curl -XGET localhost:9200/acqindex/_search&pretty=true&explain=true -d '{
"query":{
"custom_score": {
"query": {  
"query": "hotel in"
},
"script" : "_score * doc['po'].value"
}
}
}'

Thanks



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/55d56cf3-89f3-4e25-b72a-7812ad7d764d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


ElasticSearch not able to find documents which contains the searched text.

2014-01-31 Thread coder
Hi,

I have following mapping:

curl -XPUT 'http://localhost:9200/acqindex/' -d '{
 "settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
   "analyzer" : {
  "str_search_analyzer" : {
  "tokenizer" : "standard",
  "filter" : 
["lowercase","asciifolding","suggestions_shingle","edgengram"]
   },
   "str_index_analyzer" : {
 "tokenizer" : "standard",
 "filter" : 
["lowercase","asciifolding","suggestions_shingle","edgengram"]
 }
   },
   "filter" : {
   "suggestions_shingle": {
   "type": "shingle",
   "min_shingle_size": 2,
   "max_shingle_size": 5
  },
  "edgengram" : {
  "type" : "edgeNGram",
  "min_gram" : 2,
  "max_gram" : 30,
  "side" : "front"
  }
  }
  }
 }
  }
}'

Also, Following data is indexed in ElasticSearch:-

curl -XPUT 'localhost:9200/acqindex/acqidx/1' -d '{ text:"Hotels in Hoscur" 
}',
curl -XPUT 'localhost:9200/acqindex/acqidx/2' -d '{ text:"Hotels in 
innsburg" }'
curl -XPUT 'localhost:9200/acqindex/acqidx/3' -d '{ text:"Hotels in ink" }'
curl -XPUT 'localhost:9200/acqindex/acqidx/4' -d '{ text:"Hotels in 
houston" }'
curl -XPUT 'localhost:9200/acqindex/acqidx/4' -d '{ text:"Hotels in 
darjeling" }'
curl -XPUT 'localhost:9200/acqindex/acqidx/5' -d '{ text:"Hotels in darjel" 
}'
curl -XPUT 'localhost:9200/acqindex/acqidx/6' -d '{ text:"Hotels in india" 
}'

Now, if I query like this:

curl -XGET localhost:9200/acqindex/_search -d '{
"query": {  
"query_string" : {
"query": "hotel"
}
}
}'

It is not able to find any document. But my index analyzer generates 
"hotel" as one of the token. So, why ElasticSearch is not able to find 
above listed documents.

Also, I have other doubts regarding ElasticSearch Query mechanism:

1. Whenever I query for "hotels" ElasticSearch gives me all the above 
listed docs but the order is somewhat biased towards more such docs which 
contains greater occurrences of "ho" or "in". Is it like ElasticSearch 
gives more weightage to docs where the no. of tokens of a particular type 
like "ho" is more ?

2. Also, sometimes when I search in ElasticSearch, it don't give me those 
docs on the top which have exact same text phrase. In fact, it keeps 
showing other docs which contains some extraneous text words. How can I 
tackle this problem. I can't search for exact user typed text phrase since 
there are many ways to query a single thing. I want to search all the docs 
that can be possible matches but somehow I want my results to be biased 
towards docs which contains exact phrases on the top and rest of the 
matches to be in the bottom. for eg, if I search for hotel oberoi in delhi

then I want results like:

hotel oberoi in delhi
hotel oberoi
hotel in delhi
hotels in delhi

3. Is there any query type which are generally used for autocompleters 
which are based on ElasticSearch. Or, if anyone has a better idea of 
querying in ElasticSearch.

4. Is there anything like a machine learning in query mechanism so that I 
can understand what user wants while typing so that I can give better 
results.

I really like ElasticSearch but it's giving me a hell lot of problem 
implementing Autocompleter.

How can I modify my index and search analyzers for better results.

PS : I'm using ElasticSearch-0.90.3 and right now I can't move to latest 
version. 

Please help with the above stuff guys. 

Thanks



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c86a60e-fef6-4215-a3cd-64f0845fb8e1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Custom Score Query not working in elasticsearch-0.90.3

2014-01-30 Thread coder
Hi,

I'm facing an issue in Elasticsearch-0.90.3 in my java code. I'm writing 
following query:

String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0 ? 1 
: doc['po'].value)";
QueryBuilder queryBuilder = QueryBuilders.
customScoreQuery(QueryBuilders.queryString(query)
.field("text",30)
.field("ad")
.field("st")
.field("cn")
.field("co")

.defaultOperator(Operator.AND)).script(script);

But the above script calculates same score for similar documents even if 
they have a different po value. The issue is it's not taking doc['po'] into 
consideration. 

Following is the output of my explain parameter:

{category=Hotel, text=hotels in ranchi, count=45.0, 
_id=525472d7d4a769f431649936, location={lon=85.3, lat=23.35},  po=8.8}
1195.7068 = custom score, product of:
  1195.7068 = script score function: composed of:
239.14136 = sum of:
  215.63971 = max of:
215.63971 = sum of:
  18.701233 = weight(text:ho in 170585) [PerFieldSimilarity], 
result of:
18.701233 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
  0.27964544 = queryWeight, product of:
2.7864501 = idf(docFreq=960416, maxDocs=5731988)
0.10035904 = queryNorm
  66.8748 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
2.7864501 = idf(docFreq=960416, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
  22.496681 = weight(text:hot in 170585) [PerFieldSimilarity], 
result of:
22.496681 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
  0.30671278 = queryWeight, product of:
3.056155 = idf(docFreq=733378, maxDocs=5731988)
0.10035904 = queryNorm
  73.34772 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.056155 = idf(docFreq=733378, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
  22.589373 = weight(text:hote in 170585) [PerFieldSimilarity], 
result of:
22.589373 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
  0.307344 = queryWeight, product of:
3.062 = idf(docFreq=728780, maxDocs=5731988)
0.10035904 = queryNorm
  73.498665 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.062 = idf(docFreq=728780, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
  22.600088 = weight(text:hotel in 170585) [PerFieldSimilarity], 
result of:
22.600088 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
  0.30741686 = queryWeight, product of:
3.0631707 = idf(docFreq=728251, maxDocs=5731988)
0.10035904 = queryNorm
  73.5161 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.0631707 = idf(docFreq=728251, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
  129.25233 = weight(text:hotels in 170585) [PerFieldSimilarity], 
result of:
129.25233 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
  0.73517686 = queryWeight, product of:
7.3254676 = idf(docFreq=10260, maxDocs=5731988)
0.10035904 = queryNorm
  175.81122 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
7.3254676 = idf(docFreq=10260, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
  23.50164 = max of:
23.50164 = weight(text:in^30.0 in 170585) [PerFieldSimilarity], 
result of:
  23.50164 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
0.31348857 = queryWeight, product of:
  30.0 = boost
  3.1236706 = idf(docFreq=685498, maxDocs=5731988)
  0.0033453014 = queryNorm
74.968094 = fieldWeight in 170585, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  3.1236706 = idf(docFreq=685498, maxDocs=5731988)
  24.0 = fieldNorm(doc=170585)
  1.0 = queryBoost

{category=Hotel, text=hotels in kerala, count=5.0, 
_id=525472d7d4a769f4316499ca, location={lon=93.95, lat=24.81667}, 
po=9.228571428571428}
1195.7068 = custom score, product of:
  1195.7068 = script score function: composed of:
239.14136 = sum of:
  215.63971 = max of:
215.6397

Custom Score script parameter not working

2014-01-30 Thread coder
Hi,

I'm using following custom score script parameter for customizing the 
scoring of my documents.

String script = "doc['po'].value";

But it gives me error that:

[Error: could not access: value; in class: 
org.elasticsearch.index.fielddata.ScriptDocValues$Empty

My al documents contain this field but why Elasticsearch is not able to 
access the value of this field ?

Any idea ?

I have also included that field in my mapping.
   "po": {
  "type": "double",
  "boost": 4.0
  }

We can use boost for such a field right ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/972b4f45-1512-4932-9bcb-6fc5cce30fb2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Can anyone please explain the mapping in Elasticsearch ?

2014-01-29 Thread coder
Hi,

I have an existing mapping with me but I'm not able to understand it fully.

curl -XPUT 'http://localhost:9200/auto_index/' 
-d '{
 "settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
   "analyzer" : {
  "str_search_analyzer" : {
  "tokenizer" : "standard",
  "filter" : 
["lowercase","asciifolding","suggestion_shingle","edgengram"]
   },
   "str_index_analyzer" : {
 "tokenizer" : "standard",
 "filter" : 
["lowercase","asciifolding","suggestions_shingle","edgengram"]
  }
   },
   "filter" : {
   "suggestions_shingle": {
   "type": "shingle",
   "min_shingle_size": 2,
   "max_shingle_size": 5
  },
  "edgengram" : {
  "type" : "edgeNGram",
  "min_gram" : 2,
  "max_gram" : 30,
  "side" : "front"
  }
  }
  },
  "similarity" : {
 "index": {
 "type": "default"
 },
 "search": {
 "type": "default"
 }
  }
 }
  }

curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
"autocomplete":{
   "_boost" : {
"name" : "po", 
"null_value" : 4.0
   },
   "properties": {
"ad": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"category": {
"type": "string",
"include_in_all" : false
},
"cn": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"ctype": {
"type": "string",
"search_analyzer" : "keyword",
"index_analyzer" : "keyword",
"omit_norms": "true",
"similarity": "index"
},
"eid": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"po": {
"type": "double",
"boost": 4.0
},
"en":{
"type": "boolean"
},
"_oid":{
"type": "long"
},
"text": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"url": {
"type": "string"
}   
 }
 }
}'

I'm not able to understand how the index and search analyzer works exactly. 
Let's say I have following documents with me:-

{text:hotels in hosur}
{text:hotels in innsburg}
{text: hotels in mp}
{text: hotels in ink}
{text:hotels in ranchi}

Now If I query for 'hotels in' then I get hotels in hosur on the top. Is it 
because I have more occurrences of ho i the doc.

Can anyone please explain me with some sample sentence like how exactly my 
query string is getting analyzed ?

Thanks

-- 
You received this

Need some help with Custom Score Query

2014-01-29 Thread coder
Hi,

I'm using a custom score query for fetching my results from elasticsearch. 
I used a popularity "po" field for modifying _script parameter but somehow 
it's not working. I used explain parameter to see how the score is getting 
caluclated but my problem is it doesn't seem to take po into consideration 
while calculating score. I have printed the score calculation for two docs. 
Both have same scores but po field is different for both. Can anyone tell 
me where am I going wrong ?

I'm using the following custom score query in my java code:-
String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 
0.0 ? 1 : doc['po'].value)";
QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
QueryBuilders.queryString(query)
.field("text",30)
.field("ad")
.field("st")
.field("cn")
.field("co")

.defaultOperator(Operator.AND)).script(script);


{category=Hotel, text=hotels in ranchi, count=45.0, 
_id=525472d7d4a769f431649936, location={lon=85.3, lat=23.35},  po=8.8}
1195.7068 = custom score, product of:
  1195.7068 = script score function: composed of:
239.14136 = sum of:
  215.63971 = max of:
215.63971 = sum of:
  18.701233 = weight(text:ho in 170585) [PerFieldSimilarity], 
result of:
18.701233 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
  0.27964544 = queryWeight, product of:
2.7864501 = idf(docFreq=960416, maxDocs=5731988)
0.10035904 = queryNorm
  66.8748 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
2.7864501 = idf(docFreq=960416, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
  22.496681 = weight(text:hot in 170585) [PerFieldSimilarity], 
result of:
22.496681 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
  0.30671278 = queryWeight, product of:
3.056155 = idf(docFreq=733378, maxDocs=5731988)
0.10035904 = queryNorm
  73.34772 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.056155 = idf(docFreq=733378, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
  22.589373 = weight(text:hote in 170585) [PerFieldSimilarity], 
result of:
22.589373 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
  0.307344 = queryWeight, product of:
3.062 = idf(docFreq=728780, maxDocs=5731988)
0.10035904 = queryNorm
  73.498665 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.062 = idf(docFreq=728780, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
  22.600088 = weight(text:hotel in 170585) [PerFieldSimilarity], 
result of:
22.600088 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
  0.30741686 = queryWeight, product of:
3.0631707 = idf(docFreq=728251, maxDocs=5731988)
0.10035904 = queryNorm
  73.5161 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.0631707 = idf(docFreq=728251, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
  129.25233 = weight(text:hotels in 170585) [PerFieldSimilarity], 
result of:
129.25233 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
  0.73517686 = queryWeight, product of:
7.3254676 = idf(docFreq=10260, maxDocs=5731988)
0.10035904 = queryNorm
  175.81122 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
7.3254676 = idf(docFreq=10260, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
  23.50164 = max of:
23.50164 = weight(text:in^30.0 in 170585) [PerFieldSimilarity], 
result of:
  23.50164 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
0.31348857 = queryWeight, product of:
  30.0 = boost
  3.1236706 = idf(docFreq=685498, maxDocs=5731988)
  0.0033453014 = queryNorm
74.968094 = fieldWeight in 170585, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  3.1236706 = idf(docFreq=685498, maxDocs=5731988)
  24.0 = fieldNorm(doc=170585)
  1.0 = queryBoost

{category=

Re: fs river giving error on reading large text file.

2014-01-25 Thread coder . ajit
Hi,

It is a simple .txt file format which contain text only.

How to verify file is indexable or not?

Thanks,
Ajitpal

On Saturday, January 25, 2014 10:01:46 PM UTC, David Pilato wrote:
>
> I never tested fsriver with such files.
> What kind of file is it?
>
> If it's not really indexable, I would exclude it.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 25 janv. 2014 à 20:14, "joerg...@gmail.com " <
> joerg...@gmail.com > a écrit :
>
> Do you want a 2.5G file being a single hit?
>
> The correct method is to write code for reading the file in a stream-like 
> manner and extract the relevant content into JSON documents for search 
> hits. 
>
> If not, you have to prepare the file and partition it into docs by a 
> domain specific parser, a task the fs river was not built for.
>
> Jörg
>
>
>
> On Sat, Jan 25, 2014 at 7:38 PM, > wrote:
>
>> Hello All,
>>
>> I have tried Elasticsearch + fs-river plugin to read the local directory 
>> and file system. I have a file about 2.5 gb text file. While reading this 
>> file, it gives error and dump the heap to elastic search folder. I have 
>> started the es server with 6gb memory as given in elastic search 
>> configuration.
>>
>> i have tried to check the code in fs-river plugin, it load the file using 
>> following code. 
>>
>>
>>
>> FileInputStream fileReader = new FileInputStream(file);
>>
>> // write it to a byte[] using a buffer since we don't know 
>> the exact
>> // image size
>> byte[] buffer = new byte[1024];
>> ByteArrayOutputStream bos = new ByteArrayOutputStream();
>> int i = 0;
>> while (-1 != (i = fileReader.read(buffer))) {
>> bos.write(buffer, 0, i);
>> }
>> byte[] data = bos.toByteArray();
>>
>> fileReader.close();
>> bos.close();
>>
>>
>> Is there any way to parse the large text based file in ES server using 
>> Fs-River? Did any one got success in loading heavy text files in ES server?
>>
>> There are not lot of setting to do, its really easy. If i didn't set any 
>> property. Please let me know.
>>
>> Thanks,
>> APS
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/4e8a2f87-e2ee-41dd-b67d-f7e68a78c65c%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF7rMwZOO-iYAq2JPH7koMc7ZRmq%2BPOXBu4yv-E74z3dw%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6f55106c-ef3b-46d0-a56f-bc475f9787a8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


fs river giving error on reading large text file.

2014-01-25 Thread coder . ajit
Hello All,

I have tried Elasticsearch + fs-river plugin to read the local directory 
and file system. I have a file about 2.5 gb text file. While reading this 
file, it gives error and dump the heap to elastic search folder. I have 
started the es server with 6gb memory as given in elastic search 
configuration.

i have tried to check the code in fs-river plugin, it load the file using 
following code. 



FileInputStream fileReader = new FileInputStream(file);

// write it to a byte[] using a buffer since we don't know the 
exact
// image size
byte[] buffer = new byte[1024];
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int i = 0;
while (-1 != (i = fileReader.read(buffer))) {
bos.write(buffer, 0, i);
}
byte[] data = bos.toByteArray();

fileReader.close();
bos.close();


Is there any way to parse the large text based file in ES server using 
Fs-River? Did any one got success in loading heavy text files in ES server?

There are not lot of setting to do, its really easy. If i didn't set any 
property. Please let me know.

Thanks,
APS

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4e8a2f87-e2ee-41dd-b67d-f7e68a78c65c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


How to configure elasticsearch to sort the scored documents on a field after score for documents is calculated ?

2014-01-21 Thread coder
Hi,

I want to configure the ElasticSearch so that it after calculating the 
score of each document corresponding to a user query, it sorts those score 
documents by some other field and then give the response. There may be a 
case in which some documents have same score. I want Elasticsearch to sort 
those same scoring documents on some other field. That's why needed this 
functionality. Can anyone tell me how can I add this functionality to my 
mapping ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cc1f83fb-c00e-4b71-8e12-5b2c096714d0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


How to use ElasticSearch to implement Autocompleter ?

2014-01-17 Thread coder
Hi,

I'm trying to use elasticsearch to implement a autocompleter  for my 
college project just like some travel websites use it for implementing 
their autocompleter but facing some issues in implementation.

I'm using following mapping for my case:-

curl -XPUT 'http://localhost:9200/auto_index/' 
-d '{
 "settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
   "analyzer" : {
  "str_search_analyzer" : {
  "tokenizer" : "standard",
  "filter" : ["lowercase","asciifolding","
suggestion_shingle","edgengram"]
   },
   "str_index_analyzer" : {
 "tokenizer" : "standard",
 "filter" : 
["lowercase","asciifolding","suggestions_shingle","edgengram"]
  }
   },
   "filter" : {
   "suggestions_shingle": {
   "type": "shingle",
   "min_shingle_size": 2,
   "max_shingle_size": 5
  },
  "edgengram" : {
  "type" : "edgeNGram",
  "min_gram" : 2,
  "max_gram" : 30,
  "side" : "front"
  },
  "mynGram" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 30
  }
  }
  },
  "similarity" : {
 "index": {
 "type": 
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
 },
 "search": {
 "type": 
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
 }
  }
 }
  }

curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
"autocomplete":{
   "_boost" : {
"name" : "po", 
"null_value" : 4.0
   },
   "properties": {
"ad": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"category": {
"type": "string",
"include_in_all" : false
},
"cn": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"ctype": {
"type": "string",
"search_analyzer" : "keyword",
"index_analyzer" : "keyword",
"omit_norms": "true",
"similarity": "index"
},
"eid": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"po": {
"type": "double",
"boost": 4.0
},
"en":{
"type": "boolean"
},
"_oid":{
"type": "long"
},
"text": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"url": {
"type": "string"
}   
 }
 }
}'

and then in my java code, i'm forming query like:-

String

Re: How to use ElasticSearch Custom Similarity provider classes ?

2014-01-16 Thread coder

I solved the error problem but still ElasticSearch is using its Default 
Similariy. Can anyone tell me why is it not taking these new 
CustomSimilarity files into consideration.
On Thursday, 16 January 2014 16:54:02 UTC+5:30, coder wrote:
>
> Hi,
>
> I'm using the following two java files for overriding the Default 
> Similarity of ElasticSearch 0.90.3 but it's not working for me. 
>
>
> https://github.com/awnuxkjy/es-custom-similarity-provider/tree/master/src/main/java/org/elasticsearch/index/similarity
>
> I have complied the two java files and added the corresponding class files 
> in my /lib/elasticsearc-0.90.3.jar file in the path 
> org/elasticsearch/index/similarity. Now, I'm trying to use those files in 
> my mapping like mentioned in:-
>
> curl -XPOST 'http://host:port/tweeter/' -d '
> {
>   "settings": {
> "similarity": {
>   "index": {
> "type": "org.elasticsearch.index.similarity.CustomSimilarityProvider"
>   },
>   "search": {
> "type": "org.elasticsearch.index.similarity.CustomSimilarityProvider"
>   }
> }
>   }
> }'
>
> But it's giving me following error:-
>
> {"error":"IndexCreationException[[acqindex] failed to create index]; nested: 
> ElasticSearchIllegalArgumentException[SimilarityProvider [my_similarity] must 
> have an associated type]; ","status":400}
>
> Can anyone please tell me how can I use these two files for overriding the 
> default similarity ?
>
> Thanks
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e082c7b9-9c7d-4b67-94e5-7194fcf2b114%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


How to use ElasticSearch Custom Similarity provider classes ?

2014-01-16 Thread coder
Hi,

I'm using the following two java files for overriding the Default 
Similarity of ElasticSearch 0.90.3 but it's not working for me. 

https://github.com/awnuxkjy/es-custom-similarity-provider/tree/master/src/main/java/org/elasticsearch/index/similarity

I have complied the two java files and added the corresponding class files 
in my /lib/elasticsearc-0.90.3.jar file in the path 
org/elasticsearch/index/similarity. Now, I'm trying to use those files in 
my mapping like mentioned in:-

curl -XPOST 'http://host:port/tweeter/' -d '
{
  "settings": {
"similarity": {
  "index": {
"type": "org.elasticsearch.index.similarity.CustomSimilarityProvider"
  },
  "search": {
"type": "org.elasticsearch.index.similarity.CustomSimilarityProvider"
  }
}
  }
}'

But it's giving me following error:-

{"error":"IndexCreationException[[acqindex] failed to create index]; nested: 
ElasticSearchIllegalArgumentException[SimilarityProvider [my_similarity] must 
have an associated type]; ","status":400}

Can anyone please tell me how can I use these two files for overriding the 
default similarity ?

Thanks


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/86099655-b887-42a8-92e4-92fa1f06ce48%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


How can we use elasticsearch custom similarity plugin in mapping ?

2014-01-15 Thread coder
Hi,

I need to override Lucene Default Similarity Class which is used by 
Elasticsearch for indexing and searching. On searching net, I found some 
similar implementations which are doing similar things. My difficulty is 
that I have no idea of how to actually implement this in my code. I found 
some resources:

https://github.com/tlrx/elasticsearch-custom-similarity-provider

curl -XPOST 'http://host:port/tweeter/' -d '{
  "settings": {
"similarity": {
  "index": {
"type": "org.elasticsearch.index.similarity.CustomSimilarityProvider"
  },
  "search": {
"type": "org.elasticsearch.index.similarity.CustomSimilarityProvider"
  }
}
  }}'

I'm not able to understand how can i use this line directly during indexing 
so that my default similarity changes to this custom similarity.

*org.elasticsearch.index.similarity.CustomSimilarityProvider*

Can anyone please tell me how I can do this ? I tried running the same 
thing on my machine but it's not working for me. Do I need to modify this 
line or path ?

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/18cb1da8-7c4c-4ea0-b5b1-10b6f81880b0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.