RangeFilter on Nested/Child Facet Counts

2014-03-11 Thread Benji Smith
Imagine I have three different types of data stored in my ElasticSearch 
index: users, posts, and replies. Each post belongs to a particular user. 
Each reply belongs to a particular user and post.

I'd like to write a query to find all users with a particular number of 
posts (e.g., find all users with exactly one post, or all users with 
between ten and twenty posts).

Along those same lines, I'd like to find all users with posts having a 
particular number of replies. Basically, I'd like to filter documents based 
on facet counts, which seems like a pretty reasonable thing to do, but I 
can't figure out any way to do it.

I've tried to model this problem with embedded objects, nested objects, and 
parent/child objects, but the query DSL doesn't seem to support these kinds 
of queries, no matter how I model the data.

Any advice?

Benji Smith

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/055da68c-8641-4592-a0f3-a8f82ded29f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: NumberFormatException when sorting by numeric document ID

2014-02-14 Thread Benji Smith
This can *absolutely* be fixed in ElasticSearch. It's not a problem 
with Lucene, but with how ES data is mapped onto the Lucene data model.

The problem is that types and fields use local names instead of 
fully-qualified names. As far as Lucene is concerned, there would be a 
field named "user.id" mapped as a long, another field named "product.id" 
mapped as a string, and a nested type named "user.address.id" mapped as an 
integer. Under this kind of system, "user" and "product" can exist in the 
same index, without even the possibility that their names and types would 
clash.

benji


On Thursday, February 13, 2014 6:41:41 PM UTC-5, Ivan Brusic wrote:
>
> I doubt this issue will ever be "fixed" since the limitation exists in 
> Lucene. All types belong to the same index and a field's data needs to be 
> uniform in Lucene's eyes.  A document's type is used to indicate different 
> mappings for a document, but not different ways to segment the data types 
> in the index itself. This scenario should be documented however, so that 
> others do not fall into the same trap.
>
> -- 
> Ivan
>
>
> On Thu, Feb 13, 2014 at 9:18 AM, Benji Smith 
> 
> > wrote:
>
>> Thanks for your comment! Looks like correct github issue to reference is 
>> this one:
>>
>> https://github.com/elasticsearch/elasticsearch/issues/4081
>>
>> I've added my comments, and I'm rooting for a solution to this problem 
>> rather than just a warning, which won't really solve the problem for us. 
>> Fingers crossed!
>>
>> benji
>>
>>
>> On Thursday, February 13, 2014 11:21:08 AM UTC-5, Binh Ly wrote:
>>>
>>> This is on the plate. I'm not 100% sure exactly what the fix will be but 
>>> it could be something along the lines of a warning when a mapping is 
>>> introduced with the same field name but different types.
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/c1cb37bc-1698-4b03-9791-535e9f12bbf0%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2934c18e-fce5-4ffa-9fe4-b0115d53e2f9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: NumberFormatException when sorting by numeric document ID

2014-02-13 Thread Benji Smith
Thanks for your comment! Looks like correct github issue to reference is 
this one:

https://github.com/elasticsearch/elasticsearch/issues/4081

I've added my comments, and I'm rooting for a solution to this problem 
rather than just a warning, which won't really solve the problem for us. 
Fingers crossed!

benji

On Thursday, February 13, 2014 11:21:08 AM UTC-5, Binh Ly wrote:
>
> This is on the plate. I'm not 100% sure exactly what the fix will be but 
> it could be something along the lines of a warning when a mapping is 
> introduced with the same field name but different types.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1cb37bc-1698-4b03-9791-535e9f12bbf0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: NumberFormatException when sorting by numeric document ID

2014-02-13 Thread Benji Smith
Are there any ES committers on the mailing list who wouldn't mind 
commenting on this issue?

If I have two different types "user" and "session", and both of them have 
and "id" field, shouldn't ElasticSearch understand that those are two 
different fields, and that their fully-qualified names are actually 
"user.id" and "session.id"? Using only fully-qualified names in the lucene 
internals seems like a straightforward way to fix the problem.

Incidentally, it looks like there's a bug-report (submitted two YEARS ago!) 
here:

https://github.com/elasticsearch/elasticsearch/issues/1737

If this is the desirable behavior, then why hasn't this bug been closed as 
"won't fix"? Or if it's legitimately a bug, why wasn't it fixed before 
releasing 1.0? It seems like a pretty fundamental flaw in the system that 
the functionality of one type can be broken by the definition of another 
essentially unrelated type.

I can understand why the behavior is what it is, historically, but it seems 
self-evidently like a bug. In what kind of system would this be the 
desirable behavior?

Thanks!

benji


On Thursday, February 13, 2014 10:48:56 AM UTC-5, Benji Smith wrote:
>
> Yes, you guys are right. There are multiple different types in this index, 
> and some of them have LONG ids, while others have INTEGER or STRING ids. 
> I'll have to redesign a few parts of the mappings to fix that problem.
>
> I suppose the same restriction applies across nested types as well, right? 
> I'm heavily using nested types, and many of them have their own inner ID 
> fields.
>
> Thanks for your help!
>
> benji
>
> On Thursday, February 13, 2014 7:45:37 AM UTC-5, Binh Ly wrote:
>>
>> Like Alex mentioned, I would check all the mappings to ensure the types 
>> of the id field are all the same (doesn't matter what the value is in it - 
>> what matters is the type defined in the mapping). Your error message means: 
>> in one type the id field is long (in the mapping), and in the other type 
>> the id field is int (in the mapping). And you are querying across those 2 
>> types which gives this error.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24c29e85-6a94-426d-9746-71960b23fd4c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: NumberFormatException when sorting by numeric document ID

2014-02-13 Thread Benji Smith
Yes, you guys are right. There are multiple different types in this index, 
and some of them have LONG ids, while others have INTEGER or STRING ids. 
I'll have to redesign a few parts of the mappings to fix that problem.

I suppose the same restriction applies across nested types as well, right? 
I'm heavily using nested types, and many of them have their own inner ID 
fields.

Thanks for your help!

benji

On Thursday, February 13, 2014 7:45:37 AM UTC-5, Binh Ly wrote:
>
> Like Alex mentioned, I would check all the mappings to ensure the types of 
> the id field are all the same (doesn't matter what the value is in it - 
> what matters is the type defined in the mapping). Your error message means: 
> in one type the id field is long (in the mapping), and in the other type 
> the id field is int (in the mapping). And you are querying across those 2 
> types which gives this error.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/63afc704-3d2e-452b-8ebc-808d9d3e3dfc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


NumberFormatException when sorting by numeric document ID

2014-02-12 Thread Benji Smith
Hello hello! I have a bizarre error I've been trying to debug for a few 
weeks with no luck, and I'm finally left to conclude that it may be a bug 
in ElasticSearch.

Once every few days, I start seeing shard failures in my query results, 
like this:

{
"index": "my_index",
"shard": 3,
"status": 500,
"reason": 
"RemoteTransportException[[HOSTNAME][inet[/10.0.123.123:9300]][search/phase/query]];
 nested: QueryPhaseExecutionException[[my_index][3]: 
query[ConstantScore(cache(_type:my_type))],from[0],size[10],sort[]:
 Query Failed [Failed to execute main query]]; nested: 
ElasticSearchException[java.lang.NumberFormatException: Invalid shift value 
(64) in prefixCoded bytes (is encoded value really an INT?)]; nested: 
UncheckedExecutionException[java.lang.NumberFormatException: Invalid shift 
value (64) in prefixCoded bytes (is encoded value really an INT?)]; nested: 
NumberFormatException[Invalid shift value (64) in prefixCoded bytes (is encoded 
value really an INT?)]; "
}


This query is operating against an index with about 100 different fields 
(including several different nested types), but the relevant portion of the 
mapping looks like this:

{
  "my_type" : {
  "_id"   : { "type" : "long", "path" : "id" },
  "properties": {
"id": { "type" : "long" },
/* ... LOTS OF OTHER FIELDS, INCLUDING MANY NESTED TYPES */
  }}
}

I've been able to isolate the shard failures to a minimal query of this 
form:

{
  "query" : { "match_all" : { } },
  "sort" : [{
"id" : { "order" : "asc" }
  }]
}

Basically, sorting by (numeric) ID causes shard failures when the shards 
sometimes mistakenly think that there are non-numeric values in the "id" 
field. I've audited the data, and it conforms with the schema. The id 
fields always contain valid LONG values.

Whenever the shard failures occur, I can silence them for a few days by 
optimizing the index, like this:

curl -XPOST 'http://HOSTNAME:9200/my_index/_optimize?max_num_segments=1'

And the shard failures will stop for a day or two, but inevitably, within a 
few days the failures will return and I'll have to optimize the index 
again. The weird thing is that the status URL always reports GREEN status 
and all shards healthy, even when these queries are failing on every 
request.

I experienced these failures originally on 0.90.5, but I continued seeing 
the same problems after recently upgrading to 0.90.10. I even deleted the 
index and rebuilt from scratch under 0.90.10, but I've kept seeing the same 
failures.

Any idea what might be going on?

Thanks!

benji smith

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3d8da85f-ed52-4127-b283-94998e851713%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.