Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-09 Thread Davis Varghese
Bago, The code I wrote is not generating the issue. In our case, we build a ML pipeline from a UI and is done in a particular fashion so that a user can create a pipeline behind the scene using drag and drop. I am yet to dig deeper to recreate the same as a standalone code. Meanwhile I am sharing

Re: Timeline for Spark 2.3

2017-11-09 Thread Nick Pentreath
+1 I think that’s practical On Fri, 10 Nov 2017 at 03:13, Erik Erlandson wrote: > +1 on extending the deadline. It will significantly improve the logistics > for upstreaming the Kubernetes back-end. Also agreed, on the general > realities of reduced bandwidth over the

How can I connect hawq using SparkSession?

2017-11-09 Thread Chi Zewen
Hi, I have some shells needed to run on spark. I want to read and write data stored in hawk but I don’t want to modify my shells too much. How should I modify SparkSession to make spark work with hawq just like working with hive? For example, I can use “.enableHiveSupport()” to enable Hive

skip.header.line.count is ignored in HiveContext

2017-11-09 Thread sunerhan1...@sina.com
hello, I've got a table in Hive(path located to csv formatted files) which is configured to skip the header row using TBLPROPERTIES("skip.header.line.count"="1"). When querying from Hive the header row is not included in the data, but when running the same query via HiveContext I get the

Re: Timeline for Spark 2.3

2017-11-09 Thread Erik Erlandson
+1 on extending the deadline. It will significantly improve the logistics for upstreaming the Kubernetes back-end. Also agreed, on the general realities of reduced bandwidth over the Nov-Dec holiday season. Erik On Thu, Nov 9, 2017 at 6:03 PM, Matei Zaharia wrote: >

Re: Timeline for Spark 2.3

2017-11-09 Thread Matei Zaharia
I’m also +1 on extending this to get Kubernetes and other features in. Matei > On Nov 9, 2017, at 4:04 PM, Anirudh Ramanathan > wrote: > > This would help the community on the Kubernetes effort quite a bit - giving > us additional time for reviews and testing for

Re: Timeline for Spark 2.3

2017-11-09 Thread Anirudh Ramanathan
This would help the community on the Kubernetes effort quite a bit - giving us additional time for reviews and testing for the 2.3 release. On Thu, Nov 9, 2017 at 3:56 PM, Justin Miller wrote: > That sounds fine to me. I’m hoping that this ticket can make it into

Re: Timeline for Spark 2.3

2017-11-09 Thread Justin Miller
That sounds fine to me. I’m hoping that this ticket can make it into Spark 2.3: https://issues.apache.org/jira/browse/SPARK-18016 It’s causing some pretty considerable problems when we alter the columns to be nullable, but we are OK for now

Timeline for Spark 2.3

2017-11-09 Thread Michael Armbrust
According to the timeline posted on the website, we are nearing branch cut for Spark 2.3. I'd like to propose pushing this out towards mid to late December for a couple of reasons and would like to hear what people think. 1. I've done release management during the Thanksgiving / Christmas time

Re: Jenkins upgrade/Test Parallelization & Containerization

2017-11-09 Thread Xin Lu
Yeah Sean so the setup I had didn't really care about parallelizing in Maven. It just stashed the built artifacts and moved them onto the slaves running tests and tests for each submodule ran in a separate docker container. After each subbuild was done the build logs were transferred back and

Re: Task failures and other problems

2017-11-09 Thread Vadim Semenov
Probably not Oracle but Cloudera  Jan, I think your DataNodes might be overloaded, I'd suggest reducing `spark.executor.cores` if you run executors alongside DataNodes, so the DataNode process would get some resources. The other thing you can do is to increase `dfs.client.socket-timeout` in

Re: Task failures and other problems

2017-11-09 Thread Jan-Hendrik Zab
Jörn Franke writes: > Maybe contact Oracle support? Something like that would be the last option I guess, university money is usually hard to come by for such things. > Do you have maybe accidentally configured some firewall rules? Routing > issues? Maybe only one of the

Re: Task failures and other problems

2017-11-09 Thread Jörn Franke
Maybe contact Oracle support? Do you have maybe accidentally configured some firewall rules? Routing issues? Maybe only one of the nodes... > On 9. Nov 2017, at 20:04, Jan-Hendrik Zab wrote: > > > Hello! > > This might not be the perfect list for the issue, but I tried

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-09 Thread Bago Amirbekian
Davis, were you able to find an example? Anything you have could help help. On Wed, Nov 1, 2017 at 8:53 PM Davis Varghese wrote: > Sure. I will get one over the weekend > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > >

Task failures and other problems

2017-11-09 Thread Jan-Hendrik Zab
Hello! This might not be the perfect list for the issue, but I tried user@ previously with the same issue, but with a bit less information to no avail. So I'm hoping someone here can point me into the right direction. We're using Spark 2.2 on CDH 5.13 (Hadoop 2.6 with patches) and a lot of our