date:20190709

Set TimeOut and continue with other tasks

2019-07-09 Thread Wei Chen

Hello All, I am using spark to process some files parallelly. While most files are able to be processed within 3 seconds, it is possible that we stuck on 1 or 2 files as they will never finish (or will take more than 48 hours). Since it is a 3rd party file conversion tool, we are not able to

[Beginner] Run compute on large matrices and return the result in seconds?

2019-07-09 Thread Gautham Acharya

This is my first email to this mailing list, so I apologize if I made any errors. My team's going to be building an application and I'm investigating some options for distributed compute systems. We want to be performing computes on large matrices. The requirements are as follows: 1.

Pass row to UDF and select column based on pattern match

2019-07-09 Thread Femi Anthony

How can I achieve the following by passing a row to a udf ? val df1 = df.withColumn("col_Z", when($"col_x" === "a", $"col_A") .when($"col_x" === "b", $"col_B") .when($"col_x" === "c", $"col_C") .when($"col_x" === "d",

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-09 Thread Dongjoon Hyun

Thank you for the reply, Sean. Sure. 2.4.x should be a LTS version. The main reason of 2.4.4 release (before 3.0.0) is to have a better basis for comparison to 3.0.0. For example, SPARK-27798 had an old bug, but its correctness issue is only exposed at Spark 2.4.3. It would be great if we can

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-09 Thread Sean Owen

We will certainly want a 2.4.4 release eventually. In fact I'd expect 2.4.x gets maintained for longer than the usual 18 months, as it's the last 2.x branch. It doesn't need to happen before 3.0, but could. Usually maintenance releases happen 3-4 months apart and the last one was 2 months ago. If

Release Apache Spark 2.4.4 before 3.0.0

2019-07-09 Thread Dongjoon Hyun

Hi, All. Spark 2.4.3 was released two months ago (8th May). As of today (9th July), there exist 45 fixes in `branch-2.4` including the following correctness or blocker issues. - SPARK-26038 Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long - SPARK-26045

Spark structural streaming sinks output late

2019-07-09 Thread Kamalanathan Venkatesan

Hello, I have below spark structural streaming code and I was expecting the results to be printed on the console every 10 seconds. But, I notice the sink to console happening every ~2 mins and above. What could be the issue def streaming(): Unit = { System.setProperty("hadoop.home.dir",

Set TimeOut and continue with other tasks

[Beginner] Run compute on large matrices and return the result in seconds?

Pass row to UDF and select column based on pattern match

Re: Release Apache Spark 2.4.4 before 3.0.0

Re: Release Apache Spark 2.4.4 before 3.0.0

Release Apache Spark 2.4.4 before 3.0.0

Spark structural streaming sinks output late

7 matches

Site Navigation

Mail list logo

Footer information