date:20170504

Re: scalastyle violation on mvn install but not on mvn package

2017-05-04 Thread Mark Hamstra

The check goal of the scalastyle plugin runs during the "verify" phase, which is between "package" and "install"; so running just "package" will not run scalastyle:check. On Thu, May 4, 2017 at 7:45 AM, yiskylee wrote: > ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0

long running jobs with Spark

2017-05-04 Thread Afshin, Bardia

Starting long running jobs with upstarts on linux (spark-submit) is super slow. I can see only a small percentage of the CPU is being utilized and applying nice –n 20 to the process doesn’t seem to do anything. Anyone dealt with long running processes / jobs on Spark and has any best practices

Spark Streaming 2.1 - slave parallel recovery

2017-05-04 Thread Dominik Safaric

Hi all, I’m running cluster consisting of a master and four slaves. The cluster runs a Spark application that reads data from a Kafka topic over a window of time, and writes the data back to Kafka. Checkpointing is enabled by using HDFS. However, although Spark periodically commits checkpoints

Re: [Spark Streaming] Dynamic Broadcast Variable Update

2017-05-04 Thread Gene Pang

As Tim pointed out, Alluxio (renamed from Tachyon) may be able to help you. Here is some documentation on how to run Alluxio and Spark together , and here is a blog post on a Spark streaming + Alluxio use case

scalastyle violation on mvn install but not on mvn package

2017-05-04 Thread yiskylee

./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package works, but ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean install triggers scalastyle violation error. Is the scalastyle check not used on package but only on install? To install, should I

Re: unable to find how to integrate SparkSession with a Custom Receiver.

2017-05-04 Thread kant kodali

got it! Thank you! On Thu, May 4, 2017 at 12:58 AM, Tathagata Das wrote: > Structured Streaming is not designed to integrate with receivers. The > sources in Structured Streaming are designed for providing stronger > fault-tolerance guarantees by precisely tracking

Re: Kerberos impersonation of a Spark Context at runtime

2017-05-04 Thread Abel Rincón

Hi Mathieu, Stratio is working on it, we have a solution running which accomplish our use case, could you share your use case with us? Here you have video and slides of our work on this topic https://spark-summit.org/east-2017/events/kerberizing-spark/ Regards Abel. 2017-05-04 15:01

Re: Kerberos impersonation of a Spark Context at runtime

2017-05-04 Thread Saisai Shao

Current Spark doesn't support impersonate different users at run-time. Current Spark's proxy user is application level, which means when setting through --proxy-user the whole application will be running with that user. On Thu, May 4, 2017 at 5:13 PM, matd wrote: > Hi folks,

Normalize columns items for Onehotencoder

2017-05-04 Thread issues solution

Hi, I have 3 data frame with not same items inside labled values i mean : data frame 1 collabled a b c dataframe2 collabled a w z when i enode the first data fram i get collabled ab c a1 0 0 b 01 0 c

RE: [Spark Streaming] - Killing application from within code

2017-05-04 Thread Sidney Feiner

Instead of setting up an additional mechanism, would it be "clean" to catch the error back in the driver, and use SparkContext.stop() there? And beause the SparkContext can’t be serialized, I can't catch the error inside the rdd.foreach function. What I did eventually and it worked:

Kerberos impersonation of a Spark Context at runtime

2017-05-04 Thread matd

Hi folks, I have a Spark application executing various jobs for different users simultaneously, via several Spark sessions on several threads. My customer would like to kerberize his hadoop cluster. I wonder if there is a way to configure impersonation such as each of these jobs would be ran

Re: Create multiple columns in pyspak with one shot

2017-05-04 Thread Rick Moritz

In Scala you can first define your columns, and then use the list-to-vararg-expander :_* in a select call, something like this: val cols = colnames.map(col).map(column => { *lit(0)* }) dF.select(cols: _*) I assume something similar should be possible in Java as well, from your snippet it's

Re: unable to find how to integrate SparkSession with a Custom Receiver.

2017-05-04 Thread Tathagata Das

Structured Streaming is not designed to integrate with receivers. The sources in Structured Streaming are designed for providing stronger fault-tolerance guarantees by precisely tracking records by their offsets (e.g. Kafka offsets). This is different from the Receiver APIs which did not require

Create multiple columns in pyspak with one shot

2017-05-04 Thread issues solution

Hi , How we can create multiple columns iteratively i mean how you can create empty columns inside loop because : with for i in listl : df = df.withcolumn(i,F.lit(0)) we get stackoverflow how we can do that inside list of columns like that df.select([F.col(i).lit(0) for i in

unable to find how to integrate SparkSession with a Custom Receiver.

2017-05-04 Thread kant kodali

Hi All, I have a Custom Receiver that implements onStart() and OnStop Methods of the Receiver class and I am trying to figure out how to integrate with SparkSession since I want to do stateful analytics using Structured Streaming. I couldn't find it in the docs. any idea? When I was doing

Re: Hive on Spark is not populating correct records

2017-05-04 Thread Vikash Pareek

After lots of expermiments, I have figured out that it was a potential bug in cloudera with Hive on Spark. Hive on Spark does not populate consistent output on aggregate functions. Hopefully, it will be fixed in next relaese. -- View this message in context:

Re: What are Analysis Errors With respect to Spark Sql DataFrames and DataSets?

2017-05-04 Thread kant kodali

Thanks a lot! On Wed, May 3, 2017 at 4:36 PM, Michael Armbrust wrote: > if I do dataset.select("nonExistentColumn") then the Analysis Error is >> thrown at compile time right? >> > > if you do df.as[MyClass].map(_.badFieldName) you will get a compile > error. However,

any support to use Spark UDF in HIVE

2017-05-04 Thread Manohar753

HI , I have seen many hive udf are getting used in spark SQL,so is there any way to do it reverse.I want to write some code on spark for UDF and the same can be used in HIVE. please suggest me all possible approaches in spark with JAVA. Thaks in advance. Regards, Manoh -- View this message

Re: scalastyle violation on mvn install but not on mvn package

long running jobs with Spark

Spark Streaming 2.1 - slave parallel recovery

Re: [Spark Streaming] Dynamic Broadcast Variable Update

scalastyle violation on mvn install but not on mvn package

Re: unable to find how to integrate SparkSession with a Custom Receiver.

Re: Kerberos impersonation of a Spark Context at runtime

Re: Kerberos impersonation of a Spark Context at runtime

Normalize columns items for Onehotencoder

RE: [Spark Streaming] - Killing application from within code

Kerberos impersonation of a Spark Context at runtime

Re: Create multiple columns in pyspak with one shot

Re: unable to find how to integrate SparkSession with a Custom Receiver.

Create multiple columns in pyspak with one shot

unable to find how to integrate SparkSession with a Custom Receiver.

Re: Hive on Spark is not populating correct records

Re: What are Analysis Errors With respect to Spark Sql DataFrames and DataSets?

any support to use Spark UDF in HIVE

18 matches

Site Navigation

Mail list logo

Footer information