Re: simple query string with flags returns no results

2015-04-28 Thread Roger de Cordova Farias
Are you actually using a comma after the firstname^1.3? It is invalid
JSON in both cases

2015-04-28 14:15 GMT-03:00 Daniel Nill danielln...@gmail.com:

 curl -XPUT http://0.0.0.0:9200/users; -d '{
   first_name: daniel,
   last_name: nill
 }'


 curl -XGET 'http://0.0.0.0:9200/users/_search; -d '{
   query: {
 bool: {
   must: {
 simple_query_string: {

   query:daniel nill,
   fields:[
 lastname^6.5,
 firstname^1.3,
   ],

   default_operator:and,
   flags:AND|OR|NOT|PHRASE|PRECEDENCE
 }
   }
 }
   }
 }'

 this returns no results

 However,

 curl -XGET 'http://0.0.0.0:9200/users/_search; -d '{
   query: {
 bool: {
   must: {
 simple_query_string: {

   query:daniel nill,
   fields:[
 lastname^6.5,
 firstname^1.3,
   ],

   default_operator:and
 }
   }
 }
   }
 }'

 This returns results.

 Any idea what I'm missing?

 This is on 1.5.1

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/83d32c16-80b7-4428-904b-4d5bc9055be0%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/83d32c16-80b7-4428-904b-4d5bc9055be0%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533kQ4uQR%2B9fGhwKqDi2P_cM-R6gNDcCheG098E2X_DoiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: ScrollId doesn't advance with 2 indexes on a read alias 1.4.4

2015-04-14 Thread Roger de Cordova Farias
Are you sure that calling the same scroll_id won't return the next results?

AFAIK,  the scroll_id can be the same and still return new records

2015-04-14 14:26 GMT-03:00 Todd Nine tn...@apigee.com:

 Hey guys,
   I have 2 indexes.  I have a read alias on both of the indexes (A and B),
 and a write alias on 1 (B).   I then insert 10 documents to the write alias
 which inserts them into index B.  I perform the following query.

 {
   from : 0,
   size : 1,
   post_filter : {
 bool : {
   must : {
 term : {
   edgeSearch :
 4cd2ba95-e2c9-11e4-bb39-c6c6eebe8d56_application__4cd2ba96-e2c9-11e4-bb39-c6c6eebe8d56_owner__users__SOURCE
 }
   }
 }
   },
   sort : [ {
 fields.double : {
   order : asc,
   nested_filter : {
 term : {
   name : ordinal
 }
   }
 }
   }, {
 fields.long : {
   order : asc,
   nested_filter : {
 term : {
   name : ordinal
 }
   }
 }
   }, {
 fields.string.exact : {
   order : asc,
   nested_filter : {
 term : {
   name : ordinal
 }
   }
 }
   }, {
 fields.boolean : {
   order : asc,
   nested_filter : {
 term : {
   name : ordinal
 }
   }
 }
   } ]
 }

 I receive my first record, and a scroll id, as expected.

 On my next request, I perform a  request with the the scroll Id from the
 first response.

 What I expect:  I expect to receive my second record, and a new scrollId.

 What I get:  I get the first record again, with the same scroll Id.

 I'm on a 1.4.4 server, with a 1.4.4 node client running locally
 integration testing.


 When I use the same logic on a read alias with a single index, I do not
 experience this problem, so I'm reasonably certain my client is coded
 correctly.


 Any ideas?

 Thanks,
 Todd

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/8a317e5f-eb6f-4aef-a257-3902d31c3567%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/8a317e5f-eb6f-4aef-a257-3902d31c3567%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2530z-FubTUaxd7rRUDLHahRD3SD1Dz1r7nhF9PJkycdsuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Alert notification with percolator

2015-04-02 Thread Roger de Cordova Farias
I have never used percolator, but afaik you have to call the percolator api
after you have the document indexed:

http://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html#_percolating_an_existing_document

2015-04-02 15:25 GMT-03:00 Lincoln Xiong xiong.huang...@gmail.com:

 I try to use elasticsearch as a 2nd log output storage, to analyze some
 info in logs. In this case, an alert triggers will be very useful. I read
 through docs talking about percolator and I think this should be the way to
 make it out.But after some trying, I found that I don't really get how
 percolator works. It seems that I use REST API to index a document with a
 percolator already set up, it will return if that document match the
 percolator query or not. For my case, I use Logstash as input which of
 course don't have this kind of feedback. And a count appears to be
 accessible from REST that I can use to get this kind feedback from
 percolator but I find it no where.

 Could some one give me an idea about how I can achieve this kind of
 feature with Elasticsearch?

 I know I can have ways to trigger an alert in Logstash but for my case
 Logstash is a temporary tool to input the data, I could possible not use it
 in the future.
 I also notice that Graylog has kind of alert. When the input event match
 some keywords, the alarm will trigger. I guess it also use some percolator
 APIs but I wish to know how can I do this alone with Elasticsearch only.

 Thanks a lot.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/cf5da9d8-7000-4a0b-94af-3ce064feee90%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/cf5da9d8-7000-4a0b-94af-3ce064feee90%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2530%2BAkecJAbVnQ3E8X1_iWNqzAK%3DmbDkHgECAomZgnfdrg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does Elasticsearch convert dates to JSON string representations?

2015-03-20 Thread Roger de Cordova Farias
Elastic won't edit your source. The long type is used internally

2015-03-20 14:16 GMT-03:00 Erik Iverson erikriver...@gmail.com:

 Hello everyone,

 I have a question on how Elasticsearch returns JSON representations of
 fields with the date type. My confusion comes from the fact that the page
 http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-core-types.html
  says:

 The date type is a special type which maps to JSON string type. It
 follows a specific format that can be explicitly set. *All dates are UTC.
 Internally, a date maps to a number type long, with the added parsing stage
 from string to long and from long to string.* (emphasis mine)

 It sounds like dates are stored as type 'long'. But when I POST documents
 with dates and then retrieve them, they are returned in the same format as
 I POSTed them. So it appears ES is storing how I POSTed each date
 somewhere.

 I have a reproducible curl example demonstrating my confusion in more
 detail on Stackoverflow here:


 http://stackoverflow.com/questions/29157945/how-does-elasticsearch-convert-dates-to-json-string-representations

 Thank you for any insights!


 Best,
 --Erik Iverson

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/add6bafe-bdba-406e-b95d-ade17d0a9df5%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/add6bafe-bdba-406e-b95d-ade17d0a9df5%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2530Vwz99jV1K0JWNpGkOvR23PhHPAPacc-oROO0hZFjA0g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does Elasticsearch convert dates to JSON string representations?

2015-03-20 Thread Roger de Cordova Farias
Well, the company won't edit his source anyway :p (but I get your point,
I'm used to refer to Elasticsearch as Elastic, I have to fix it)

I think his question is: he posts a document with a date in string format
and retrieve it in the same format. He was expecting to retrieve it as long
type since it is the type Elasticsearch uses internally.

I'm not familiar with the internal code of Elasticsearch, but as far as I
know, it won't change the source during indexing. It probably uses long
type in the index, but when you retrieve the source, you retrieve the
exactly source you posted

2015-03-20 16:16 GMT-03:00 Mark Walkom markwal...@gmail.com:

 It's Elasticsearch, Elastic is the company :)


 We convert dates to unix epoch, which is why you should insert them as UTC.

 On 20 March 2015 at 10:22, Roger de Cordova Farias 
 roger.far...@fontec.inf.br wrote:

 Elastic won't edit your source. The long type is used internally

 2015-03-20 14:16 GMT-03:00 Erik Iverson erikriver...@gmail.com:

 Hello everyone,

 I have a question on how Elasticsearch returns JSON representations of
 fields with the date type. My confusion comes from the fact that the page
 http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-core-types.html
  says:

 The date type is a special type which maps to JSON string type. It
 follows a specific format that can be explicitly set. *All dates are
 UTC. Internally, a date maps to a number type long, with the added parsing
 stage from string to long and from long to string.* (emphasis mine)

 It sounds like dates are stored as type 'long'. But when I POST
 documents with dates and then retrieve them, they are returned in the same
 format as I POSTed them. So it appears ES is storing how I POSTed each date
 somewhere.

 I have a reproducible curl example demonstrating my confusion in more
 detail on Stackoverflow here:


 http://stackoverflow.com/questions/29157945/how-does-elasticsearch-convert-dates-to-json-string-representations

 Thank you for any insights!


 Best,
 --Erik Iverson

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/add6bafe-bdba-406e-b95d-ade17d0a9df5%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/add6bafe-bdba-406e-b95d-ade17d0a9df5%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAJp2530Vwz99jV1K0JWNpGkOvR23PhHPAPacc-oROO0hZFjA0g%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAJp2530Vwz99jV1K0JWNpGkOvR23PhHPAPacc-oROO0hZFjA0g%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X84ecBBxOfBFNkKt3EKLSUiTCHAkqpv4fV44W30dYx-cw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X84ecBBxOfBFNkKt3EKLSUiTCHAkqpv4fV44W30dYx-cw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533uGXxG_drPabJN48d_7ZgbXxhLKZaxnRrm%2BK3-QuvmaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: What's wrong with this query?

2015-03-17 Thread Roger de Cordova Farias
Look at this example on how to use multiple filters:
http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_multiple_filters

You should wrap them on a bool filter

2015-03-17 15:32 GMT-03:00 jrkroeg jrkr...@gmail.com:

 I'm trying to get the top 100 documents which match the filtered criteria,
 and sort by distance from the pin.location.

 Here's my query - which isn't resulting in error, but should be returning
 results:

 {
  query: {
  filtered: {
  query: {
  match_all: {}
  },
  filter: [
  {
  term: {
  searchTerm1: N
  }
  },
  {
  term: {
  searchTerm2: Y
  }
  },
  {
  term: {
  searchTerm3: Y
  }
  },
  {
  term: {
  searchTerm4: Y
  }
  }
  ]
  }
  },
 sort: [
 {
 _geo_distance: {
 pin.location: {
 lat: 34.073620,
 lon: -118.400356
 },
 order: asc,
 unit: mi
 }
 }
 ],
 size: 100
 }


 On a separate note, I'd like to find a way to make the filter more of a
 suggestion, rather than forced - how would I achieve this?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/22379295-332d-4ebe-aef3-6c9b2326e755%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/22379295-332d-4ebe-aef3-6c9b2326e755%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533a5NcTmSnSYBDTJtmPpVk9a1vyiO9TZkYnPqdyP3TwnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Terms aggregations in docs with nested objects using a lot of memory

2015-03-02 Thread Roger de Cordova Farias
We are running ElasticSearch in a cluster with 1 node, 1 index, 6 shards,
55 million docs. We run queries with terms aggregation in 15 fields and it
works well, taking about 10 seconds to return.

We reindexed the docs in another cluster with 1 node, 1 index, 4 shards and
the same 55 million docs to run some tests. The mapping is a little
different, now having some nested objects. We run the same queries as
before (adapted to use the nested queries and aggregations) but we always
get circuit breaker error because loading the fields to the memory for the
aggregation would take more memory than available.

Both machines have the same configurations (64GB of memory, running ES
with ES_HEAP_SIZE=32g)

I used the node stats api to get some info about the fielddata
(_stats/fielddata?fields=my_fieldpretty) in both machines about a field
that didn't have any change in the mapping, existing directly in the root
document (not nested), and I got a huge difference in memory usage:

*Machine 1:*

{
 _shards : {
 total : 8,
 successful : 4,
 failed : 0
 },
 _all : {
 primaries : {
 fielddata : {
 memory_size_in_bytes : 28132578552,
 evictions : 0,
 fields : {
 my_field : {
 memory_size_in_bytes : 224983649
 }
 }
 }
 },
 total : {
 fielddata : {
 memory_size_in_bytes : 28132578552,
 evictions : 0,
 fields : {
 my_field : {
 memory_size_in_bytes : 224983649
 }
 }
 }
 }
 },
 indices : {
 my_index_1 : {
 primaries : {
 fielddata : {
 memory_size_in_bytes : 28132578552,
 evictions : 0,
 fields : {
 my_field : {
 memory_size_in_bytes : 224983649
 }
 }
 }
 },
 total : {
 fielddata : {
 memory_size_in_bytes : 28132578552,
 evictions : 0,
 fields : {
 my_field : {
 memory_size_in_bytes : 224983649
 }
 }
 }
 }
 }
 }
 }


*Machine 2:*

{
 _shards : {
 total : 12,
 successful : 6,
 failed : 0
 },
 _all : {
 primaries : {
 fielddata : {
 memory_size_in_bytes : 6812053739,
 evictions : 0,
 fields : {
 my_field : {
 memory_size_in_bytes : 62533082
 }
 }
 }
 },
 total : {
 fielddata : {
 memory_size_in_bytes : 6812053739,
 evictions : 0,
 fields : {
 my_field : {
 memory_size_in_bytes : 62533082
 }
 }
 }
 }
 },
 indices : {
 my_index_2 : {
 primaries : {
 fielddata : {
 memory_size_in_bytes : 6812053739,
 evictions : 0,
 fields : {
 my_field : {
 memory_size_in_bytes : 62533082
 }
 }
 }
 },
 total : {
 fielddata : {
 memory_size_in_bytes : 6812053739,
 evictions : 0,
 fields : {
 my_field : {
 memory_size_in_bytes : 62533082
 }
 }
 }
 }
 }
 }
 }


While in the old index the field uses *62.5331MB*, in the new index it uses
*224.984MB*. Heavier fields that uses about 1GB in the old index are using
4~6GB in the new index. With the 15 aggregations together, the memory usage
increased to a size that won't fit in the heap.

Does the fact that the document have nested objects change the amount of
memory needed to keep non-nested fields in memory?

I tested using include_in_root in every nested object and doing all my
aggregation directly in the root doc (not using nested aggregations at all)
and still every field uses way more memory than the old index, with the
same data. Can someone explain it? I have no clue

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2532ue8bNrt3391xadCw9HH_gBCSPy5gPY3ds1hTDmnGL-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Search within array

2015-01-27 Thread Roger de Cordova Farias
I'm searching on an array of objects

The problem is when I search using query string
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-dsl-query-string-query,
it matches the text split in different objects (different array positions).
Is there a way to avoid this behavior and search the query string within
the same array position?

I know that I could index the field with a high position_offset_gap and
search using phrase, but I don't need the text to be in order, only within
the same array position

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2531wF4r09FHHcWOLVcU5V4O_p%2BCGEpqGBkto37ic3oe0Pg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Ignore a field in the scoring

2015-01-08 Thread Roger de Cordova Farias
Thank you very much

2015-01-08 4:35 GMT-02:00 Masaru Hasegawa haniomas...@gmail.com:

 Hi,

 I believe it's intended according to
 https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
 .
 It says:
 --
 Note that CollectionStatistics.maxDoc() is used instead of
 IndexReader#numDocs() because also TermStatistics.docFreq() is used, and
 when the latter is inaccurate, so is CollectionStatistics.maxDoc(), and in
 the same direction. In addition, CollectionStatistics.maxDoc() is more
 efficient to compute
 --

 Masaru

 On Thu, Jan 8, 2015 at 12:01 AM, Roger de Cordova Farias 
 roger.far...@fontec.inf.br wrote:

 Thank you for your explanation

 Do you know if it is a bug of intended behavior?

 I don't think deleted (marked as deleted) docs should be used at all

 2015-01-07 1:53 GMT-02:00 Masaru Hasegawa haniomas...@gmail.com:

 Hi,

 Update is delete and add. I mean, instead of updating existing document,
 it deletes it and adds it as new document.
 And those deleted documents are just marked as deleted and aren’t
 actually removed from index until the segment merge.

 IDF doesn’t take those deleted-but-not-removed document into account (it
 counts those documents).
 That’s the reason you see different IDF score (you see both maxDocs and
 docFreq are incremented).

 Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID).
 But when the document is updated (delete + add), it got new ID 0 in new
 segment.

 So, I think it’s not possible to keep score when you update documents.
 You can run optimise with max_num_segments=1 every time you update
 documents but it’s not practical (and until optimise is done, you see
 different score)


 Masaru



 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533-8TBoyPmfpqj12T_TVb4z%2BrgLKqtuOxRfReajti7WfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Ignore a field in the scoring

2015-01-05 Thread Roger de Cordova Farias
Now I ran the query with explain = true. The results are the following:


*Explain before the update:*


   details: [
 {
   value: 5.752348,
   description: fieldWeight in 424, product of:,
   details: [
 {
   value: 1,
   description: tf(freq=1.0), with freq of:,
   details: [
 {
   value: 1,
   description: termFreq=1.0
 }
   ]
 },
 {
   value: 9.203756,
   description: idf(docFreq=201, maxDocs=738240)
 },
 {
   value: 0.625,
   description: fieldNorm(doc=424)
 }
   ]
 }
   ]



*Update script (scriptLang = groovy, profileId = 1):*

if (ctx._source.bookmarked_by == null) {
 ctx._source.bookmarked_by = [profileId]
 } else if (ctx._source.bookmarked_by.contains(profileId)) {
 ctx.op = none
 } else {
 ctx._source.bookmarked_by += profileId
 }



*Explain after the update:*

  details: [
 {
   value: 5.749262,
   description: fieldWeight in 0, product of:,
   details: [
 {
   value: 1,
   description: tf(freq=1.0), with freq of:,
   details: [
 {
   value: 1,
   description: termFreq=1.0
 }
   ]
 },
 {
   value: 9.198819,
   description: idf(docFreq=202, maxDocs=738241)
 },
 {
   value: 0.625,
   description: fieldNorm(doc=0)
 }
   ]
 }
   ]



* Query used with the explain:*

{
   query: {
 query_string: {
   fields: [
 name
   ],
   query: roger
 }
   }
 }





The inverse document frequency (idf) is changed after adding a new field
that is not used in the query. Also, it changed the fieldWeight in 424
and fieldNorm(doc=424) to  fieldWeight in 0 and fieldNorm(doc=0) (idk
if it changes something)

Can someone help me on how to not change the score of the document after
running the update? Note that the update creates a new field if it was not
found (== null), but this field is not used in the query

2015-01-05 13:35 GMT-02:00 Roger de Cordova Farias 
roger.far...@fontec.inf.br:

 The added field is an array of Integers, but we are not using it in the
 query at all

 We are not querying the _all field, it is disabled in our type mapping

 Our query is something like this:

 {
   query: {
 query_string: {
   fields: [
 name
   ],
   query: roger
 }
   }
 }


 I ran this query. In the first result, I added a new field called
 bookmarked_by with a numeric value. Then I ran the same query again. The
 document in which I added the new field is no longer the first result

 2014-12-26 17:34 GMT-02:00 Doug Turnbull 
 dturnb...@opensourceconnections.com:

 Are you querying the _all field? How are you doing your searches?

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html

 The _all field receives a copy of every  field you index, so adding data
 here could impact scores regardless of the source field.

 Otherwise, fields are scored independently before being put together by
 other queries like boolean queries or dismax. Are you using
 boolean/dismax/etc over multiple fields?

 -Doug

 On Fri, Dec 26, 2014 at 11:59 AM, Ivan Brusic i...@brusic.com wrote:

 Use the field in a filter and not part of the query. Is this field free
 text?

 Ivan
 On Dec 23, 2014 9:12 PM, Roger de Cordova Farias 
 roger.far...@fontec.inf.br wrote:

 Hello

 Our documents have metadata indexed with them, but we don't want the
 metadata to interfere in the scoring

 After a user searches for documents, they can bookmark them (what means
 we add more metadata to the document), then in the next search with the
 same query the bookmarked document  appears in a lower (worse) position

 Is there a way to completely ignore one or more specific fields in the
 scoring of every query? as in indexing time or something?

 Note that we are not using the metadata field in the query, but yet it
 lowers the score of every query

 We cannot set the index attribute of this field to no because we
 are gonna use it in other queries

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p

Re: Ignore a field in the scoring

2015-01-05 Thread Roger de Cordova Farias
The added field is an array of Integers, but we are not using it in the
query at all

We are not querying the _all field, it is disabled in our type mapping

Our query is something like this:

{
   query: {
 query_string: {
   fields: [
 name
   ],
   query: roger
 }
   }
 }


I ran this query. In the first result, I added a new field called
bookmarked_by with a numeric value. Then I ran the same query again. The
document in which I added the new field is no longer the first result

2014-12-26 17:34 GMT-02:00 Doug Turnbull 
dturnb...@opensourceconnections.com:

 Are you querying the _all field? How are you doing your searches?

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html

 The _all field receives a copy of every  field you index, so adding data
 here could impact scores regardless of the source field.

 Otherwise, fields are scored independently before being put together by
 other queries like boolean queries or dismax. Are you using
 boolean/dismax/etc over multiple fields?

 -Doug

 On Fri, Dec 26, 2014 at 11:59 AM, Ivan Brusic i...@brusic.com wrote:

 Use the field in a filter and not part of the query. Is this field free
 text?

 Ivan
 On Dec 23, 2014 9:12 PM, Roger de Cordova Farias 
 roger.far...@fontec.inf.br wrote:

 Hello

 Our documents have metadata indexed with them, but we don't want the
 metadata to interfere in the scoring

 After a user searches for documents, they can bookmark them (what means
 we add more metadata to the document), then in the next search with the
 same query the bookmarked document  appears in a lower (worse) position

 Is there a way to completely ignore one or more specific fields in the
 scoring of every query? as in indexing time or something?

 Note that we are not using the metadata field in the query, but yet it
 lowers the score of every query

 We cannot set the index attribute of this field to no because we are
 gonna use it in other queries

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




 --
 Doug Turnbull
 Search  Big Data Architect
 OpenSource Connections http://o19s.com

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALG6HL9ND_SWteSetZL9059WyGRZvJrO2k4PQ9FQ1zUFhjbsxw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CALG6HL9ND_SWteSetZL9059WyGRZvJrO2k4PQ9FQ1zUFhjbsxw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533UjpAz2dvNitdD-%3DaoXL9rrkZdd%3DzC3LZz8xWYvBAoFQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Ignore a field in the scoring

2014-12-23 Thread Roger de Cordova Farias
Hello

Our documents have metadata indexed with them, but we don't want the
metadata to interfere in the scoring

After a user searches for documents, they can bookmark them (what means we
add more metadata to the document), then in the next search with the same
query the bookmarked document  appears in a lower (worse) position

Is there a way to completely ignore one or more specific fields in the
scoring of every query? as in indexing time or something?

Note that we are not using the metadata field in the query, but yet it
lowers the score of every query

We cannot set the index attribute of this field to no because we are
gonna use it in other queries

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How to use json in update script

2014-12-16 Thread Roger de Cordova Farias
Hello

I'm trying to update a document whose root object contains a list of nested
objects. I need to send an object of the nested type as a script parameter
to append to the list

How can I append the json (a string type) to the nested objects list of the
root object using Groovy? or should I use another script lang?

I tried using JsonSlurper http://groovy-lang.org/json.html in Groovy,
that converts between json and Groovy objects, but I always get:

Caused by:
 org.elasticsearch.script.groovy.GroovyScriptCompilationException:
 MultipleCompilationErrorsException[startup failed:
 Script3.groovy: 2: unable to resolve class JsonSlurper
  @ line 2, column 19.
def jsonSlurper = new JsonSlurper();
  ^
 1 error
 ]
 at
 org.elasticsearch.script.groovy.GroovyScriptEngineService.compile(GroovyScriptEngineService.java:117)
 at
 org.elasticsearch.script.ScriptService.getCompiledScript(ScriptService.java:368)
 at org.elasticsearch.script.ScriptService.compile(ScriptService.java:354)
 at
 org.elasticsearch.script.ScriptService.executable(ScriptService.java:497)
 at
 org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:149)
 ... 8 more

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2531Qm2GZbvM7CMZSd8sqjUF-VQ%3DN6YUKQam5EOPd9pBvRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to use json in update script

2014-12-16 Thread Roger de Cordova Farias
Ok, I found out that I can send a JSON as a script parameter and just
append it to the nested objects list (with list += newObject or
list.add(newObject) ) using groovy and it works

But it is not working with the Java API, I can only get it to work using
the REST API.

When using Java the JSON is treated as a string, then I get the error:

object mapping [objectsList] trying to serialize a value with no field
 associated with it, current value [{field:value}]


I can reproduce the error in the REST API by wrapping the JSON parameter
with quotes:

*Works (using REST API):*

{
   script: ctx._source.objectsList += newObject,
   params: {
 newObject: {field: value}
   },
   lang: groovy
 }


*Does not work (using REST API):*

{
   script: ctx._source.objectsList += newObject,
   params: {
 newObject: {\field\: \value\}
   },
   lang: groovy
 }


*Does not work (using JAVA API):*

String script = ctx._source.objectsList += newObject;




2014-12-16 13:04 GMT-02:00 Roger de Cordova Farias 
roger.far...@fontec.inf.br:

 Hello

 I'm trying to update a document whose root object contains a list of
 nested objects. I need to send an object of the nested type as a script
 parameter to append to the list

 How can I append the json (a string type) to the nested objects list of
 the root object using Groovy? or should I use another script lang?

 I tried using JsonSlurper http://groovy-lang.org/json.html in Groovy,
 that converts between json and Groovy objects, but I always get:

 Caused by:
 org.elasticsearch.script.groovy.GroovyScriptCompilationException:
 MultipleCompilationErrorsException[startup failed:
 Script3.groovy: 2: unable to resolve class JsonSlurper
  @ line 2, column 19.
def jsonSlurper = new JsonSlurper();
  ^
 1 error
 ]
 at
 org.elasticsearch.script.groovy.GroovyScriptEngineService.compile(GroovyScriptEngineService.java:117)
 at
 org.elasticsearch.script.ScriptService.getCompiledScript(ScriptService.java:368)
 at org.elasticsearch.script.ScriptService.compile(ScriptService.java:354)
 at
 org.elasticsearch.script.ScriptService.executable(ScriptService.java:497)
 at
 org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:149)
 ... 8 more




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2531fsDup%2B0%3DtSR48ugsVkphLG%2B1s4QbOjLP7GjrMncBbTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to use json in update script

2014-12-16 Thread Roger de Cordova Farias
*Does not work (using JAVA API):*

String script = ctx._source.objectsList += newObject;
 UpdateRequestBuilder prepareUpdate = client.prepareUpdate(indexName,
 typeName, id);
 prepareUpdate.setScriptLang(groovy);
 prepareUpdate.setScript(script, ScriptType.INLINE);
 prepareUpdate.addScriptParam(newObject, {\status\:\aasdsd\});
 prepareUpdate.get();



 Is there a way to reproduce the working REST API behavior with the Java
API?

2014-12-16 15:17 GMT-02:00 Roger de Cordova Farias 
roger.far...@fontec.inf.br:

 Ok, I found out that I can send a JSON as a script parameter and just
 append it to the nested objects list (with list += newObject or
 list.add(newObject) ) using groovy and it works

 But it is not working with the Java API, I can only get it to work using
 the REST API.

 When using Java the JSON is treated as a string, then I get the error:

 object mapping [objectsList] trying to serialize a value with no field
 associated with it, current value [{field:value}]


 I can reproduce the error in the REST API by wrapping the JSON parameter
 with quotes:

 *Works (using REST API):*

 {
   script: ctx._source.objectsList += newObject,
   params: {
 newObject: {field: value}
   },
   lang: groovy
 }


 *Does not work (using REST API):*

 {
   script: ctx._source.objectsList += newObject,
   params: {
 newObject: {\field\: \value\}
   },
   lang: groovy
 }


 *Does not work (using JAVA API):*

 String script = ctx._source.objectsList += newObject;




 2014-12-16 13:04 GMT-02:00 Roger de Cordova Farias 
 roger.far...@fontec.inf.br:

 Hello

 I'm trying to update a document whose root object contains a list of
 nested objects. I need to send an object of the nested type as a script
 parameter to append to the list

 How can I append the json (a string type) to the nested objects list of
 the root object using Groovy? or should I use another script lang?

 I tried using JsonSlurper http://groovy-lang.org/json.html in Groovy,
 that converts between json and Groovy objects, but I always get:

 Caused by:
 org.elasticsearch.script.groovy.GroovyScriptCompilationException:
 MultipleCompilationErrorsException[startup failed:
 Script3.groovy: 2: unable to resolve class JsonSlurper
  @ line 2, column 19.
def jsonSlurper = new JsonSlurper();
  ^
 1 error
 ]
 at
 org.elasticsearch.script.groovy.GroovyScriptEngineService.compile(GroovyScriptEngineService.java:117)
 at
 org.elasticsearch.script.ScriptService.getCompiledScript(ScriptService.java:368)
 at org.elasticsearch.script.ScriptService.compile(ScriptService.java:354)
 at
 org.elasticsearch.script.ScriptService.executable(ScriptService.java:497)
 at
 org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:149)
 ... 8 more




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2530t5o9jcpgGRsJo1zV%2BaSvD7Uk8QyTKha6VR-RoHQuqsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Unique values on the matching docs

2014-12-05 Thread Roger de Cordova Farias
Hello

I have a query with a from/size, and I need to get the unique values of a
specific field of the returned docs only. I could do it in the client side,
but it would help if ElasticSearch could do it for me

The Terms Aggregation helps getting the unique values, but it ignores the
from/size of the query.

Is there a way to run the Terms Aggregation in the results only, or is
there another way of getting unique values on the search result?

Thanks in advance

Roger

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2532diRjvrA-GcM1nNETE7Cuc26HoMBCY6mSJ_nYs7_oH%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Advices on migrating 1.3.2 to 1.4.1

2014-12-04 Thread Roger de Cordova Farias
Thank you for the advice

2014-12-04 9:30 GMT-02:00 Elvar Böðvarsson elv...@gmail.com:

 I upgraded our logging cluster to 1.4 without any problems.

 When I looked into upgrading a separate dev/test instance used for a
 different purpose I ran into problems with the plugins. If you are using
 plugins, make sure they are supported in 1.4.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1873d1cb-6f49-413d-8157-1220b64411e0%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/1873d1cb-6f49-413d-8157-1220b64411e0%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533%3D%3Dr1d__%2BKgQr%2Ba66rQ%3Df4WEgMNwphAY0hu06APomqeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Advices on bookmarking docs

2014-12-04 Thread Roger de Cordova Farias
We have a lot of docs like this:

{
  _type: doc,
  _id: 123,
  _source: {
parent_name: abc
  }
}

Each doc has only one parent_name but multiple docs can have the same
parent. It is like a many-to-one relationship, but the parent has no other
info apart of its name, so we didn't create a separate doc for them

Now we want to provide our users the option to bookmark parents so he can
later do queries on docs that are children of his bookmarked parents only.
We could easily do that with a terms filter like that:

{
  filter: {
terms: {
  parent_name: [
abc,
def,
ghi
  ]
}
  }
}

We could pass to the filter all the user's bookmarked parents names that
are persisted, let's say, in a relational database.

But the problem is that we have more than 50 million docs and the user can
bookmark millions of parents. That would be too heavy to send a filter with
millions of terms in every request. So we need to handle the bookmarks
directly on ElasticSearch.

We considered using a filtered alias, so that we have that very same filter
persisted in the Elastic and we won't have to pass it in every request.
This would be already way better than passing the filter in each request,
but we want more, we want it to be very performatic. Filtering with
millions of terms would be slow, even if we don't need to send the filter
in the request

Now we decided to add in our docs a meta field with information like who
bookmarked me, somethink like this:

{
  _type: doc,
  _id: 123,
  _source: {
parent_name: abc,
bookmarked_by: [
  roger,
  john
]
  }
}

Then we can use a term (term, without the s) filter like this:

{
  filter: {
term: {
  bookmarked_by: roger
}
  }
}

That would be  (I hope) way more performatic than our last approach, but
still has issues.

The problem we would have now is about updating bookmarks.
When the user bookmarks/un-bookmarks a parent, we can do a query for all
docs with this parent and update their bookmarked_by field with the user
identifier. That is ok.
But what happens when we add a new doc with a parent the user bookmarked
before?
We could query for the other docs with the same parent and copy the
bookmarked_by field to the new doc, but that is ugly.

So we concluded we need to have the bookmarked_by field centralized in a
parent doc.

We considered the following approaches:

*1 - parent-child relationship*

{
  _type: parent,
  _id: 1,
  _source: {
bookmarked_by: [
  roger,
  john
]
  }
}

{
  _type: child,
  _id: 1,
  _parent: 1,
  _source: {}
}
{
  _type: child,
  _id: 2,
  _parent: 1,
  _source: {}
}

Then, when user roger does a query on the children, the query would also
have a has_parent filter like this:

{
  has_parent: {
parent_type: parent,
filter: {
  term: {
bookmarked_by: roger
  }
}
  }
}

*2 - nested type*

{
  _type: parent,
  _id: 1,
  _source: {
bookmarked_by: [
  roger,
  john
],
children: [
  {
id: 1
  },
  {
id: 2
  }
]
  }
}

Then,  when user roger does a query, we use a nested query to query only
the children with bookmarked parents:

{
  nested: {
path: children,
query: {
  actual_query
  filter: {
has_parent: {
  parent_type: parent,
  filter: {
term: {
  bookmarked_by: roger
}
  }
}
  }
}
  }
}


*3 - No actual joins approach*


{
  _type: parent,
  _id: 1,
  _source: {
name: abc,
bookmarked_by: [
  roger,
  john
]
  }
}


{
  _type: child,
  _id: 1,
  _source: {
parent_name: abc,
bookmarked_by: [
  roger,
  john
]
  }
}

{
  _type: child,
  _id: 2,
  _source: {
parent_name: abc,
bookmarked_by: [
  roger,
  john
]
  }
}

Then, every time a parent gets updated, we query for all its children
(using the parent_name field) and update their bookmarked_by fields to
reflect the updated parent's bookmarked_by field.
And every time we add a new child doc we query for its parent and copy the
parent's bookmarked_by field to the new doc





The main problem with the first 2 approaches is the need to do join in
runtime. I didn't test them, but I think that joining with millions of docs
could be way slower than not joining at all.
Also, the nested type approach has the issue of returning the parent doc on
queries, and we need to return the matching children only.
The third approach looks to be the more performatic one, but it is almost
as ugly as not having the parent in a separate doc at all.

I may have put some wrong information here, as I didn't test every
approach. I'm only using common knowledge with some guessing, but I hope I
have described our problems well

I would like some advice, maybe I missed a better approach?

Thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe 

Re: Advices on migrating 1.3.2 to 1.4.1

2014-12-03 Thread Roger de Cordova Farias
Thank you for your response

Looks like I read it wrong in the documentation, only the Fields referred
to in alias filters must exist in the mappings of the index/indices pointed
to by the alias. part was included in the 1.4.0.beta1

Anyway, I found the terms lookup mechanism
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism
that
also solves our problem of sending a big filter in every request.

What we are doing is that the user, after doing a search, can bookmark
the results. Then he has the possibility to do new searchs on his
bookmarked docs only.

Adding metadata to our docs with information of who bookmarked them would
work, too... it only will be harder to update, because the user can
bookmark/un-bookmark them on the flow and in batches (like bookmark all
docs of the search result)

I will study the approaches to see wich one fits better for us

Thank you very much

2014-12-03 11:46 GMT-02:00 Adrien Grand adrien.gr...@elasticsearch.com:

 Hi,

 1.4 changed a lot of things, especially at the distributed system level,
 so testing it in your staging environment will certainly help ensure that
 things work as expected.

 Filtered aliases have been available for a long time (even before
 1.4.0.beta1), it's very likely that they are already available with the
 current version that you are running. However, a filter containing 10
 million of ids will be slow anyway, even if you cache them the first
 execution on a new segment might cause latency spikes since there are lots
 of postings lists that need to be merged. Would it be possible to change it
 to a simpler term filter, eg. by adding more metadata to your documents?

 On Mon, Dec 1, 2014 at 9:23 PM, Roger de Cordova Farias 
 roger.far...@fontec.inf.br wrote:

 Hello

 We currently have a cluster with 50 millions of docs using ElasticSearch
 version 1.3.2

 We were looking for something like a persisted filter, and the filtered
 aliases, added in version 1.4.0.beta1, seems perfect for it

 Our infrastructure team is not happy to upgrading it in production
 without doing a lot of tests before, so we have to do a lot of tests and
 upgrade later

 We are looking for some advices in what can go wrong with this upgrade,
 what are the risks?

 And also, is there a way to implement a persistent filter in our
 current version? I mean, some of our users will have access to a part of
 our data, we need something like a database view. We could send a filter in
 every request, but that would be too slow with, let's say, 10 millions of
 ids.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533WXXhA4hAd4qBPWa0ZUZGBPUFQ0V4Tv_u7p2OuyCChoA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533WXXhA4hAd4qBPWa0ZUZGBPUFQ0V4Tv_u7p2OuyCChoA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 --
 Adrien Grand

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6LV9v5uzeaiKwTXuZMhihhTrhKwmS0cPVmsLfGfLKYjw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6LV9v5uzeaiKwTXuZMhihhTrhKwmS0cPVmsLfGfLKYjw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533GRyUM4vwWcLKkOAuGLQqc13%3D1w8T%2BDHZWFEN7CGs-Gg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Trouble formulating a query with Java API

2014-12-01 Thread Roger de Cordova Farias
You can use the toString() method of the SearchRequestBuilder to see the
generated query. With your example it was:

{
  size : 10,
  query : {
multi_match : {
  query : searchterm,
  fields : [ FIELD1.not_analyzed, FIELD2.partial ]
}
  },
  sort : [ {
SCORE : {
  order : desc
}
  } ]
}

This query looks ok. Are you not receiving any results? Not even the
total value?

2014-11-28 16:11 GMT-02:00 Maarten Roosendaal mroosendaa...@gmail.com:

 Hi,

 I have the following (json) query i use:
 {
 fields: [ID,ID2],
 query: {
 filtered: {
 query: {
 multi_match: {
query: searchterm,
fields: [FIELD1.not_analyzed,FIELD2.partial]
 }
 },
 filter: {
 bool: {
 must: [
 { term : { FIELD2: No } },
 { term : {FIELD3: Yes } }
  ]
 }
 }
 }
 },
 sort: [
{
   SCORE: {
  order: desc
   }
}
 ]
 }

 but my attempts a building the same query with the Java API hasn't been
 fruitful.The goal is to search for a match based on a searchterm in several
 fields and some fields are more important than others.

 I know the basic setup:

 .client
 .prepareSearch()
 .setSearchType(SearchType.QUERY_AND_FETCH)
 .setQuery(QueryBuilders.multiMatchQuery(searchterm, FIELD1.not_analyzed,
 FIELD2.partial))
 .addSort(SCORE, SortOrder.DESC)
 .setSize(10)
 .execute()
 .actionGet();

 but there are no results while the json query has a returnvalue. So 2
 questions:
 1) i could use some help formulating the right Java query
 2) why is the json return stuff and the java query not?

 Thanks,
 Maarten

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/166a43b2-d92c-4307-abed-2ce95ae87fb2%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/166a43b2-d92c-4307-abed-2ce95ae87fb2%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2532Z8bataicKD%2BrUECd4yoJkXZd47MMVssgoE%3DK8opXhCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Advices on migrating 1.3.2 to 1.4.1

2014-12-01 Thread Roger de Cordova Farias
Hello

We currently have a cluster with 50 millions of docs using ElasticSearch
version 1.3.2

We were looking for something like a persisted filter, and the filtered
aliases, added in version 1.4.0.beta1, seems perfect for it

Our infrastructure team is not happy to upgrading it in production without
doing a lot of tests before, so we have to do a lot of tests and upgrade
later

We are looking for some advices in what can go wrong with this upgrade,
what are the risks?

And also, is there a way to implement a persistent filter in our current
version? I mean, some of our users will have access to a part of our data,
we need something like a database view. We could send a filter in every
request, but that would be too slow with, let's say, 10 millions of ids.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533WXXhA4hAd4qBPWa0ZUZGBPUFQ0V4Tv_u7p2OuyCChoA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggrigation with a whole string as key.

2014-11-18 Thread Roger de Cordova Farias
You have to index it as a single token.

You can have the same string indexed twice using multi fields:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields

Then you can index the string not analyzed (as in the multi fields page's
example) or using keyword tokenizer if you need the field analyzed:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-tokenizer.html#analysis-keyword-tokenizer


2014-11-18 11:46 GMT-02:00 Jörgen Lundberg jorgen.lundb...@gmail.com:

 Hi all,
 I asked this question at Stack Overflow last week.

 http://stackoverflow.com/questions/26909312/is-it-possible-to-aggregate-over-a-whole-string-in-a-logstash-query

 In Kibana I'm trying to aggregate the top errors in our log by aggregating
 over a term we call LogMessage. This works well except that the aggregation
 counts the number each word in the  LogMessage appears.

 Is it possible to aggregate over a whole string, or am I thinking about
 this the wrong way?

 /Jörgen

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4b9d4ebf-1cf1-4204-8b70-739539552d23%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/4b9d4ebf-1cf1-4204-8b70-739539552d23%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533Gb5jQQLGmVAPykS4mYx9X7ewMVOD31awDYSU5NHBQKw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Advices on mapping field to huge text value

2014-11-14 Thread Roger de Cordova Farias
Hello

I have to create a mapping to a type that will have a text field with 
values:

- that are huge (more than 32KB),
- that are very bad structured, and will have snippets like elas tic 
search and I need to find it when the user searches for elasticsearch or 
elastic search

I can't modify the source of the text, it is extracted from a pdf file, so 
I have to handle the issues in the type mapping

Can someone give me some advices on how to map this field?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fa5f5189-63d9-4d83-a028-da584e12f1d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Resume scroll-scan query?

2014-10-23 Thread Roger de Cordova Farias
I'm reindexing a ElasticSearch base with 50m docs using the scroll-scan 
request to retrieve all docs, but my reindexer program stopped at 30m

Is there a way to redo the query to retrieve the left docs? Like using 
offset?

Would the the internal order of the scan query be the same with a second 
request?

I can assure that no new docs were indexed in the old index since the 
beginning of the reindexing

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69ec11c9-774e-42df-be57-fd870d347743%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Resume scroll-scan query?

2014-10-23 Thread Roger de Cordova Farias
Hmm, I was using a small ttl, just enough to process each scroll call, but
I could try using a longer time to live and resuming from the last
scroll_id in case of error

That is a good idea, thanks

2014-10-23 17:12 GMT-02:00 John Smith java.dev@gmail.com:

 The scroll is available based on a timeout value you give it.
 Everytimetime you scroll you restart the countdown.

 You could track the last scroll id you used and try it again from there?

 On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias wrote:

 I'm reindexing a ElasticSearch base with 50m docs using the scroll-scan
 request to retrieve all docs, but my reindexer program stopped at 30m

 Is there a way to redo the query to retrieve the left docs? Like using
 offset?

 Would the the internal order of the scan query be the same with a second
 request?

 I can assure that no new docs were indexed in the old index since the
 beginning of the reindexing

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/NbshHCrBHoM/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2532aT3fhY6axy%3DRwCG3Ukh9ivP1fmqoUs3pJa65e8oAs6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Resume scroll-scan query?

2014-10-23 Thread Roger de Cordova Farias
I know it resets the ttl on each scroll call, but since I don't have an
automatic resuming process, I need to manually check the last scroll_id (I
will log it to a file) and restart the reindexing program using it. That is
why I need a longer ttl

I just tested the re-use of the scroll_id. Looks like after the first
request, the same scroll_id is returned over and over, returning new docs.

So I can't use this approach, since I will always lose the last batch after
resuming the reindexing

2014-10-23 18:20 GMT-02:00 John Smith java.dev@gmail.com:

 Small ttl is ok (well adjusted properly for you process) because everytime
 you call scroll it resets the ttl. So you don't need to put a 60m scroll
 time. It just has to be long enough to be able to process the next scroll
 id.

 I'm curious if you can re-use the scroll id. It's not specifically
 mentioned in the docs but i think scroll is forward only. So not sure once
 you got once scroll id you can go back to it. I guess one way to find out :)

 On Thursday, 23 October 2014 15:44:04 UTC-4, Roger de Cordova Farias wrote:

 Hmm, I was using a small ttl, just enough to process each scroll call,
 but I could try using a longer time to live and resuming from the last
 scroll_id in case of error

 That is a good idea, thanks

 2014-10-23 17:12 GMT-02:00 John Smith java.d...@gmail.com:

 The scroll is available based on a timeout value you give it.
 Everytimetime you scroll you restart the countdown.

 You could track the last scroll id you used and try it again from there?

 On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias
 wrote:

 I'm reindexing a ElasticSearch base with 50m docs using the scroll-scan
 request to retrieve all docs, but my reindexer program stopped at 30m

 Is there a way to redo the query to retrieve the left docs? Like using
 offset?

 Would the the internal order of the scan query be the same with a
 second request?

 I can assure that no new docs were indexed in the old index since the
 beginning of the reindexing

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit https://groups.google.com/d/
 topic/elasticsearch/NbshHCrBHoM/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/NbshHCrBHoM/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/ec345d9e-19b4-4d2c-985a-fbf245e31a19%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/ec345d9e-19b4-4d2c-985a-fbf245e31a19%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2532ZuYCP6%3DxqJeUmZGowAo9dzY%2BQGZkHKbfkTyCWxODF5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.