RangeFilter on Nested/Child Facet Counts
Imagine I have three different types of data stored in my ElasticSearch index: users, posts, and replies. Each post belongs to a particular user. Each reply belongs to a particular user and post. I'd like to write a query to find all users with a particular number of posts (e.g., find all users with exactly one post, or all users with between ten and twenty posts). Along those same lines, I'd like to find all users with posts having a particular number of replies. Basically, I'd like to filter documents based on facet counts, which seems like a pretty reasonable thing to do, but I can't figure out any way to do it. I've tried to model this problem with embedded objects, nested objects, and parent/child objects, but the query DSL doesn't seem to support these kinds of queries, no matter how I model the data. Any advice? Benji Smith -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/055da68c-8641-4592-a0f3-a8f82ded29f1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: NumberFormatException when sorting by numeric document ID
This can *absolutely* be fixed in ElasticSearch. It's not a problem with Lucene, but with how ES data is mapped onto the Lucene data model. The problem is that types and fields use local names instead of fully-qualified names. As far as Lucene is concerned, there would be a field named "user.id" mapped as a long, another field named "product.id" mapped as a string, and a nested type named "user.address.id" mapped as an integer. Under this kind of system, "user" and "product" can exist in the same index, without even the possibility that their names and types would clash. benji On Thursday, February 13, 2014 6:41:41 PM UTC-5, Ivan Brusic wrote: > > I doubt this issue will ever be "fixed" since the limitation exists in > Lucene. All types belong to the same index and a field's data needs to be > uniform in Lucene's eyes. A document's type is used to indicate different > mappings for a document, but not different ways to segment the data types > in the index itself. This scenario should be documented however, so that > others do not fall into the same trap. > > -- > Ivan > > > On Thu, Feb 13, 2014 at 9:18 AM, Benji Smith > > > wrote: > >> Thanks for your comment! Looks like correct github issue to reference is >> this one: >> >> https://github.com/elasticsearch/elasticsearch/issues/4081 >> >> I've added my comments, and I'm rooting for a solution to this problem >> rather than just a warning, which won't really solve the problem for us. >> Fingers crossed! >> >> benji >> >> >> On Thursday, February 13, 2014 11:21:08 AM UTC-5, Binh Ly wrote: >>> >>> This is on the plate. I'm not 100% sure exactly what the fix will be but >>> it could be something along the lines of a warning when a mapping is >>> introduced with the same field name but different types. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com . >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/c1cb37bc-1698-4b03-9791-535e9f12bbf0%40googlegroups.com >> . >> >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2934c18e-fce5-4ffa-9fe4-b0115d53e2f9%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: NumberFormatException when sorting by numeric document ID
Thanks for your comment! Looks like correct github issue to reference is this one: https://github.com/elasticsearch/elasticsearch/issues/4081 I've added my comments, and I'm rooting for a solution to this problem rather than just a warning, which won't really solve the problem for us. Fingers crossed! benji On Thursday, February 13, 2014 11:21:08 AM UTC-5, Binh Ly wrote: > > This is on the plate. I'm not 100% sure exactly what the fix will be but > it could be something along the lines of a warning when a mapping is > introduced with the same field name but different types. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1cb37bc-1698-4b03-9791-535e9f12bbf0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: NumberFormatException when sorting by numeric document ID
Are there any ES committers on the mailing list who wouldn't mind commenting on this issue? If I have two different types "user" and "session", and both of them have and "id" field, shouldn't ElasticSearch understand that those are two different fields, and that their fully-qualified names are actually "user.id" and "session.id"? Using only fully-qualified names in the lucene internals seems like a straightforward way to fix the problem. Incidentally, it looks like there's a bug-report (submitted two YEARS ago!) here: https://github.com/elasticsearch/elasticsearch/issues/1737 If this is the desirable behavior, then why hasn't this bug been closed as "won't fix"? Or if it's legitimately a bug, why wasn't it fixed before releasing 1.0? It seems like a pretty fundamental flaw in the system that the functionality of one type can be broken by the definition of another essentially unrelated type. I can understand why the behavior is what it is, historically, but it seems self-evidently like a bug. In what kind of system would this be the desirable behavior? Thanks! benji On Thursday, February 13, 2014 10:48:56 AM UTC-5, Benji Smith wrote: > > Yes, you guys are right. There are multiple different types in this index, > and some of them have LONG ids, while others have INTEGER or STRING ids. > I'll have to redesign a few parts of the mappings to fix that problem. > > I suppose the same restriction applies across nested types as well, right? > I'm heavily using nested types, and many of them have their own inner ID > fields. > > Thanks for your help! > > benji > > On Thursday, February 13, 2014 7:45:37 AM UTC-5, Binh Ly wrote: >> >> Like Alex mentioned, I would check all the mappings to ensure the types >> of the id field are all the same (doesn't matter what the value is in it - >> what matters is the type defined in the mapping). Your error message means: >> in one type the id field is long (in the mapping), and in the other type >> the id field is int (in the mapping). And you are querying across those 2 >> types which gives this error. >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24c29e85-6a94-426d-9746-71960b23fd4c%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: NumberFormatException when sorting by numeric document ID
Yes, you guys are right. There are multiple different types in this index, and some of them have LONG ids, while others have INTEGER or STRING ids. I'll have to redesign a few parts of the mappings to fix that problem. I suppose the same restriction applies across nested types as well, right? I'm heavily using nested types, and many of them have their own inner ID fields. Thanks for your help! benji On Thursday, February 13, 2014 7:45:37 AM UTC-5, Binh Ly wrote: > > Like Alex mentioned, I would check all the mappings to ensure the types of > the id field are all the same (doesn't matter what the value is in it - > what matters is the type defined in the mapping). Your error message means: > in one type the id field is long (in the mapping), and in the other type > the id field is int (in the mapping). And you are querying across those 2 > types which gives this error. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/63afc704-3d2e-452b-8ebc-808d9d3e3dfc%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
NumberFormatException when sorting by numeric document ID
Hello hello! I have a bizarre error I've been trying to debug for a few weeks with no luck, and I'm finally left to conclude that it may be a bug in ElasticSearch. Once every few days, I start seeing shard failures in my query results, like this: { "index": "my_index", "shard": 3, "status": 500, "reason": "RemoteTransportException[[HOSTNAME][inet[/10.0.123.123:9300]][search/phase/query]]; nested: QueryPhaseExecutionException[[my_index][3]: query[ConstantScore(cache(_type:my_type))],from[0],size[10],sort[]: Query Failed [Failed to execute main query]]; nested: ElasticSearchException[java.lang.NumberFormatException: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)]; nested: UncheckedExecutionException[java.lang.NumberFormatException: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)]; nested: NumberFormatException[Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)]; " } This query is operating against an index with about 100 different fields (including several different nested types), but the relevant portion of the mapping looks like this: { "my_type" : { "_id" : { "type" : "long", "path" : "id" }, "properties": { "id": { "type" : "long" }, /* ... LOTS OF OTHER FIELDS, INCLUDING MANY NESTED TYPES */ }} } I've been able to isolate the shard failures to a minimal query of this form: { "query" : { "match_all" : { } }, "sort" : [{ "id" : { "order" : "asc" } }] } Basically, sorting by (numeric) ID causes shard failures when the shards sometimes mistakenly think that there are non-numeric values in the "id" field. I've audited the data, and it conforms with the schema. The id fields always contain valid LONG values. Whenever the shard failures occur, I can silence them for a few days by optimizing the index, like this: curl -XPOST 'http://HOSTNAME:9200/my_index/_optimize?max_num_segments=1' And the shard failures will stop for a day or two, but inevitably, within a few days the failures will return and I'll have to optimize the index again. The weird thing is that the status URL always reports GREEN status and all shards healthy, even when these queries are failing on every request. I experienced these failures originally on 0.90.5, but I continued seeing the same problems after recently upgrading to 0.90.10. I even deleted the index and rebuilt from scratch under 0.90.10, but I've kept seeing the same failures. Any idea what might be going on? Thanks! benji smith -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d8da85f-ed52-4127-b283-94998e851713%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.