date:20170814

Re: Spark 2.2 streaming with append mode: empty output

2017-08-14 Thread Tathagata Das

In append mode, the aggregation outputs a row only when the watermark has been crossed and the corresponding aggregate is *final*, that is, will not be updated any more. See http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking On Mon,

Apache Spark on Kubernetes: New Release for Spark 2.2

2017-08-14 Thread Erik Erlandson

The Apache Spark on Kubernetes Community Development Project is pleased to announce the latest release of Apache Spark with native Scheduler Backend for Kubernetes! Features provided in this release include: - Cluster-mode submission of Spark jobs to a Kubernetes cluster - Support

Spark 2.2 streaming with append mode: empty output

2017-08-14 Thread Ashwin Raju

Hi, I am running Spark 2.2 and trying out structured streaming. I have the following code: from pyspark.sql import functions as F df=frame \ .withWatermark("timestamp","1 minute") \ .groupby(F.window("timestamp","1 day"),*groupby_cols) \ .agg(f.sum('bytes')) query =

Re: [Spark Core] Is it possible to insert a function directly into the Logical Plan?

2017-08-14 Thread Jörn Franke

What about accumulators ? > On 14. Aug 2017, at 20:15, Lukas Bradley wrote: > > We have had issues with gathering status on long running jobs. We have > attempted to draw parallels between the Spark UI/Monitoring API and our code > base. Due to the separation between

Re: [Spark Core] Is it possible to insert a function directly into the Logical Plan?

2017-08-14 Thread Vadim Semenov

Something like this, maybe? import org.apache.spark.sql.Dataset import org.apache.spark.sql.catalyst.expressions.AttributeReference import org.apache.spark.sql.execution.LogicalRDD import org.apache.spark.sql.catalyst.encoders.RowEncoder val df: DataFrame = ??? val spark = df.sparkSession val

[Spark Core] Is it possible to insert a function directly into the Logical Plan?

2017-08-14 Thread Lukas Bradley

We have had issues with gathering status on long running jobs. We have attempted to draw parallels between the Spark UI/Monitoring API and our code base. Due to the separation between code and the execution plan, even having a guess as to where we are in the process is difficult. The

DAGScheduler - two runtimes

2017-08-14 Thread Kaepke, Marc

Hi everyone, I’m a Spark newbie and have one question: What is the difference between the duration measures in my log/ console output? 17/08/14 12:48:58 INFO DAGScheduler: ResultStage 232 (reduce at VertexRDDImpl.scala:90) finished in 0.026 s 17/08/14 12:48:58 INFO DAGScheduler: Job 13

WARN HIVE: Failed to access metastore. This class should not accessed in runtime

2017-08-14 Thread Ascot Moss

Hi, I got following error when start spark thriftserver: WARN HIVE: Failed to access metastore. This class should not accessed in runtime. org.apache.hadoop.hive.ql.metadata.HiveException: Java.lang.RuntimeException: Unable to instantiate

Re: Spark 2.2 streaming with append mode: empty output

Apache Spark on Kubernetes: New Release for Spark 2.2

Spark 2.2 streaming with append mode: empty output

Re: [Spark Core] Is it possible to insert a function directly into the Logical Plan?

Re: [Spark Core] Is it possible to insert a function directly into the Logical Plan?

[Spark Core] Is it possible to insert a function directly into the Logical Plan?

DAGScheduler - two runtimes

WARN HIVE: Failed to access metastore. This class should not accessed in runtime

8 matches

Site Navigation

Mail list logo

Footer information