date:20141219

Re: XContentBuilder.copyCurrentStructure() fails with .JsonParseException: Unexpected end-of-input expected close marker for OBJECT

2014-12-19 Thread Bharathi Raja

Hi,
CREATE EXTERNAL TABLE message (
  messageId string,
  messageSize int,
  sender string,
  recipients array,
  messageParts array>,
  headers map
)ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/Loca1';

JSON:
{
"messageId": "34dd0d3c-f53b-11e0-ac12-d3e782dff199",
"messageSize": 12345,
"sender": "al...@example.com",
"recipients": [
"j...@example.com",
"b...@example.com"
],
"messageParts": [
{
"extension": "pdf",
"size": 4567
},
{
"extension": "jpg",
"size": 9451
}
],
"headers": {
"Received-SPF": "pass",
"X-Broadcast-Id": "9876"
}
}


I get JSONParseException with unexpected end-of-input. Could you please 
correct?

On Saturday, October 19, 2013 2:09:30 PM UTC+5:30, Hendrik wrote:
>
> $ curl -XGET 'http://localhost:9200/_search' -d '{
> "query" : {
> "term" : { "user" : "kimchy" }
> }
> '
>
>
>
>
> public class MyRestFilterDoingSpecialThings extends RestFilter {
>   ...
> Override
> public void process(RestRequest request, RestChannel channel,
> RestFilterChain filterChain) { ...
>
> XContentType xContentType = 
> XContentFactory.xContentType(request.content()); //json
> XContentParser parser = 
> XContentFactory.xContent(xContentType).createParser(request.content());
> XContentParser.Token t = parser.nextToken(); 
> //t is START_OBJECT
> XContentBuilder builder = 
> XContentFactory.contentBuilder( xContentType).copyCurrentStructure(parser); 
>  <-- fails with
>
> org.elasticsearch.common.jackson.core.JsonParseException: Unexpected 
> end-of-input: expected close marker for OBJECT (from [Source: [B@29569b73; 
> line: 1, column: 0])
>  at [Source: [B@29569b73; line: 5, column: 64]
> at 
> org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1369)
> at 
> org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:532)
> at 
> org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:465)
> at 
> org.elasticsearch.common.jackson.core.base.ParserBase._handleEOF(ParserBase.java:491)
> at 
> org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:2513)
> at 
> org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:617)
> at 
> org.elasticsearch.common.jackson.core.base.GeneratorBase.copyCurrentStructure(GeneratorBase.java:401)
> at 
> org.elasticsearch.common.xcontent.json.JsonXContentGenerator.copyCurrentStructure(JsonXContentGenerator.java:310)
> at 
> org.elasticsearch.common.xcontent.XContentBuilder.copyCurrentStructure(XContentBuilder.java:1035)
>
> What did i wrong?
> Thanks
> Hendrik
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/162685e1-d872-474c-a30f-a651bf7ceb5d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

index design for web activity

2014-12-19 Thread Chen Wang

Hey Guys,
Wanna seek your suggestions on the index design for web activities.
Lets say I have browse data, online purchase data, and store purchase
data, and I will need to save a year of them.
For browse data, a year of data is around 80G , online purchase data is
around 50G, and offline data is around 1T.

I have to do query like, e.g, find all the customers who browsed item A in
the past X months, and also online purchased B in the past Y month.
Originally I am using complicated parent/child structure, and that
sometimes results in very bad performance. and I store all browse
data/online purchase/store purchase in one index distributed to 7 shards.

I have 7 machines with 128G each, and 1T hard disk.

Now, I am trying to save each of those type of data into its own index, say
browse_v1, onlinepurchase_v1, storepurchase_v1. Since its time based data,
how should I decide to break them into monthly , or simply yearly? for
browse(70G)/online purchase(50G), i think i can just use one index and one
shard for them,. or should I break them into monthly data instead? breaking
into monthly indexes gives me the flexibility of adding/removing data, but
it also will decrease the query performance, right? (search against 1 index
now becomes search against 12 indexes).

For store data(1T) apparently I have to break them into at least monthly
index, but each monthly index still contains around 100G data. With my
current cluster, how many shards should I allocate to each monthly index? I
am also concerned about the query performance.

Then since I am now storing them into separate indexes, to achieve the
query I want, I will need to do application level join. Is this the common
way to handle such user case?

I know I should perform some testing first, but hope someone may have
similar experience in handling this and could provide some guidance.

thanks in advance,
Chen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2cba8839-2577-4fd7-b1e9-550ae579bb1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

48 matches

Mail list logo