Re: Spark 1.6 Release window is not updated in Spark-wiki

2015-10-01 Thread Sean Owen
My guess is that the 1.6 merge window should close at the end of November (2 months from now)? I can update it but wanted to check if anyone else has a preferred tentative plan. On Thu, Oct 1, 2015 at 2:20 AM, Meethu Mathew wrote: > Hi, > In the

Dataframe nested schema inference from Json without type conflicts

2015-10-01 Thread Ewan Leith
Hi all, We really like the ability to infer a schema from JSON contained in an RDD, but when we're using Spark Streaming on small batches of data, we sometimes find that Spark infers a more specific type than it should use, for example if the json in that small batch only contains integer

Re: Task Execution

2015-10-01 Thread Rishitesh Mishra
Depending upon the configured cores assigned to the executor, scheduler will assign that many tasks. So yes they execute in parallel. On 30 Sep 2015 14:51, "gsvic" wrote: > Concerning task execution, a worker executes its assigned tasks in parallel > or sequentially? > > >

Re: Spark 1.6 Release window is not updated in Spark-wiki

2015-10-01 Thread Patrick Wendell
BTW - the merge window for 1.6 is September+October. The QA window is November and we'll expect to ship probably early december. We are on a 3 month release cadence, with the caveat that there is some pipelining... as we finish release X we are already starting on release X+1. - Patrick On Thu,

Re: Spark 1.6 Release window is not updated in Spark-wiki

2015-10-01 Thread Patrick Wendell
Ah - I can update it. Usually i do it after the release is cut. It's just a standard 3 month cadence. On Thu, Oct 1, 2015 at 3:55 AM, Sean Owen wrote: > My guess is that the 1.6 merge window should close at the end of > November (2 months from now)? I can update it but wanted

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-01 Thread Reynold Xin
You can pass the schema into json directly, can't you? On Thu, Oct 1, 2015 at 10:33 AM, Ewan Leith wrote: > Hi all, > > > > We really like the ability to infer a schema from JSON contained in an > RDD, but when we’re using Spark Streaming on small batches of data, we

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-01 Thread Yin Huai
Hi Ewan, For your use case, you only need the schema inference to pick up the structure of your data (basically you want spark sql to infer the type of complex values like arrays and structs but keep the type of primitive values as strings), right? Thanks, Yin On Thu, Oct 1, 2015 at 2:27 PM,

Re: Tungsten off heap memory access for C++ libraries

2015-10-01 Thread Paul Wais
Update for those who are still interested: djinni is a nice tool for generating Java/C++ bindings. Before today djinni's Java support was only aimed at Android, but now djinni works with (at least) Debian, Ubuntu, and CentOS. djinni will help you run C++ code in-process with the caveat that

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-01 Thread Ewan Leith
We could, but if a client sends some unexpected records in the schema (which happens more than I'd like, our schema seems to constantly evolve), its fantastic how Spark picks up on that data and includes it. Passing in a fixed schema loses that nice additional ability, though it's what we'll

Spark 1.6 Release window is not updated in Spark-wiki

2015-10-01 Thread Meethu Mathew
Hi, In the https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage the current release window has not been changed from 1.5. Can anybody give an idea of the expected dates for 1.6 version? Regards, Meethu Mathew Senior Engineer Flytxt

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-01 Thread Ewan Leith
Exactly, that's a much better way to put it. Thanks, Ewan -- Original message-- From: Yin Huai Date: Thu, 1 Oct 2015 23:54 To: Ewan Leith; Cc: r...@databricks.com;dev@spark.apache.org; Subject:Re: Dataframe nested schema inference from Json without type conflicts Hi Ewan, For

[ANNOUNCE] Announcing Spark 1.5.1

2015-10-01 Thread Reynold Xin
Hi All, Spark 1.5.1 is a maintenance release containing stability fixes. This release is based on the branch-1.5 maintenance branch of Spark. We *strongly recommend* all 1.5.0 users to upgrade to this release. The full list of bug fixes is here: http://s.apache.org/spark-1.5.1