Re: Spark 2.0.0 - Apply schema on few columns of dataset

2016-08-07 Thread Ewan Leith
Looking at the encoders api documentation at http://spark.apache.org/docs/latest/api/java/ == Java == Encoders are specified by calling static methods on Encoders. List data = Arrays.asList("abc", "abc", "xyz");

Re: Spark 2.0.0 - Apply schema on few columns of dataset

2016-08-07 Thread Aseem Bansal
Hi All Has anyone done this with Java API? On Fri, Aug 5, 2016 at 5:36 PM, Aseem Bansal wrote: > I need to use few columns out of a csv. But as there is no option to read > few columns out of csv so > 1. I am reading the whole CSV using SparkSession.csv() > 2.

Re: Any exceptions during an action doesn't fail the Spark streaming batch in yarn-client mode

2016-08-07 Thread ayan guha
Is it a python app? On Mon, Aug 8, 2016 at 2:44 PM, Hemalatha A < hemalatha.amru...@googlemail.com> wrote: > Hello, > > I am seeing multiple exceptions shown in logs during an action, but none > of them fails the Spark streaming batch in yarn-client mode, whereas the > same exception is thrown

Any exceptions during an action doesn't fail the Spark streaming batch in yarn-client mode

2016-08-07 Thread Hemalatha A
Hello, I am seeing multiple exceptions shown in logs during an action, but none of them fails the Spark streaming batch in yarn-client mode, whereas the same exception is thrown in Yarn-cluster mode and the application ends. I am trying to save a Dataframe To cassandra, which results in error

Re: submitting spark job with kerberized Hadoop issue

2016-08-07 Thread Ted Yu
The link in Jerry's response was quite old. Please see: http://hbase.apache.org/book.html#security Thanks On Sun, Aug 7, 2016 at 6:55 PM, Saisai Shao wrote: > 1. Standalone mode doesn't support accessing kerberized Hadoop, simply > because it lacks the mechanism to

Re: silence the spark debug logs

2016-08-07 Thread Sachin Janani
Hi, You can switch of the logs by setting log level to off as follows: import org.apache.log4j.Loggerimport org.apache.log4j.Level Logger.getLogger("org").setLevel(Level.OFF)Logger.getLogger("akka").setLevel(Level.OFF) Regards, Sachin J On Mon, Aug 8, 2016 at 9:39 AM, Sumit Khanna

silence the spark debug logs

2016-08-07 Thread Sumit Khanna
Hello, I dont want to print the all spark logs, but say a few only, e.g just the executions plans etc etc. How do I silence the spark debug ? Thanks, Sumit

Re: [Spark1.6] Or (||) operator not working in DataFrame

2016-08-07 Thread Divya Gehlot
I tried with condition expression also but it didn't work :( On Aug 8, 2016 11:13 AM, "Chanh Le" wrote: > You should use *df.where(conditionExpr)* which is more convenient to > express some simple term in SQL. > > > /** > * Filters rows using the given SQL expression. >

Re: [Spark1.6] Or (||) operator not working in DataFrame

2016-08-07 Thread Chanh Le
You should use df.where(conditionExpr) which is more convenient to express some simple term in SQL. /** * Filters rows using the given SQL expression. * {{{ * peopleDf.where("age > 15") * }}} * @group dfops * @since 1.5.0 */ def where(conditionExpr: String): DataFrame = {

Re: [SPARK-2.0][SQL] UDF containing non-serializable object does not work as expected

2016-08-07 Thread Muthu Jayakumar
Hello Hao Ren, Doesn't the code... val add = udf { (a: Int) => a + notSer.value } Mean UDF function that Int => Int ? Thanks, Muthu On Sun, Aug 7, 2016 at 2:31 PM, Hao Ren wrote: > I am playing with spark 2.0 > What I tried to test is: > > Create a UDF in which

Random forest binary classification H20 difference Spark

2016-08-07 Thread Javier Rey
Hi everybody. I have executed RF on H2O I didn't troubles with nulls values, by in contrast in Spark using dataframes and ML library I obtain this error,l I know my dataframe contains nulls, but I understand that Random Forest supports null values: "Values to assemble cannot be null" Any

Re: submitting spark job with kerberized Hadoop issue

2016-08-07 Thread Saisai Shao
1. Standalone mode doesn't support accessing kerberized Hadoop, simply because it lacks the mechanism to distribute delegation tokens via cluster manager. 2. For the HBase token fetching failure, I think you have to do kinit to generate tgt before start spark application (

Using Kyro for DataFrames (Dataset)?

2016-08-07 Thread Jestin Ma
When using DataFrames (Dataset), there's no option for an Encoder. Does that mean DataFrames (since it builds on top of an RDD) uses Java serialization? Does using Kyro make sense as an optimization here? If not, what's the difference between Java/Kyro serialization, Tungsten, and Encoders?

[SPARK-2.0][SQL] UDF containing non-serializable object does not work as expected

2016-08-07 Thread Hao Ren
I am playing with spark 2.0 What I tried to test is: Create a UDF in which there is a non serializable object. What I expected is when this UDF is called during materializing the dataFrame where the UDF is used in "select", an task non serializable exception should be thrown. It depends also

Re: submitting spark job with kerberized Hadoop issue

2016-08-07 Thread Aneela Saleem
Thanks Wojciech and Jacek! I tried with Spark on Yarn with kerberized cluster it works fine now. But now when i try to access Hbase through spark i get the following error: 2016-08-07 20:43:57,617 WARN [hconnection-0x24b5fa45-metaLookup-shared--pool2-t1] ipc.RpcClientImpl: Exception encountered

Accessing HBase through Spark with Security enabled

2016-08-07 Thread Aneela Saleem
Hi all, I'm trying to run a spark job that accesses HBase with security enabled. When i run the following command: */usr/local/spark-2/bin/spark-submit --keytab /etc/hadoop/conf/spark.keytab --principal spark/hadoop-master@platalyticsrealm --class com.platalytics.example.spark.App --master yarn

Re: [Spark1.6] Or (||) operator not working in DataFrame

2016-08-07 Thread Mich Talebzadeh
although the logic should be col1 <> a && col(1) <> b to exclude both Like df.filter('transactiontype > " ").filter(not('transactiontype ==="DEB") && not('transactiontype ==="BGC")).select('transactiontype).distinct.collect.foreach(println) HTH Dr Mich Talebzadeh LinkedIn *

Re: [Spark1.6] Or (||) operator not working in DataFrame

2016-08-07 Thread Mich Talebzadeh
try similar to this df.filter(not('transactiontype ==="DEB") || not('transactiontype ==="CRE")) HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: [Spark1.6] Or (||) operator not working in DataFrame

2016-08-07 Thread janardhan shetty
Can you try 'or' keyword instead? On Aug 7, 2016 7:43 AM, "Divya Gehlot" wrote: > Hi, > I have use case where I need to use or[||] operator in filter condition. > It seems its not working its taking the condition before the operator and > ignoring the other filter

Sorting a DStream and taking topN

2016-08-07 Thread Ahmed El-Gamal
I have some DStream in Spark Scala and I want to sort it then take the top N. The problem is that whenever I try to run it I get NotSerializableException and the exception message says: This is because the DStream object is being referred to from within the closure. The problem is that I don't

Sorting a DStream and taking topN

2016-08-07 Thread Ahmed El-Gamal
I have some DStream in Spark Scala and I want to sort it then take the top N. The problem is that whenever I try to run it I get NotSerializableException and the exception message says: This is because the DStream object is being referred to from within the closure. The problem is that I don't

[Spark1.6] Or (||) operator not working in DataFrame

2016-08-07 Thread Divya Gehlot
Hi, I have use case where I need to use or[||] operator in filter condition. It seems its not working its taking the condition before the operator and ignoring the other filter condition after or operator. As any body faced similar issue . Psuedo code :

Re: Help testing the Spark Extensions for the Apache Bahir 2.0.0 release

2016-08-07 Thread Luciano Resende
Simple, just help us test the available extensions using Spark 2.0.0... preferable in real workloads that you might be using in your day to day usage of Spark. I wrote a quick getting started for using the new MQTT Structured Streaming on my blog, which can serve as an example:

Re: Help testing the Spark Extensions for the Apache Bahir 2.0.0 release

2016-08-07 Thread Sivakumaran S
Hi, How can I help? regards, Sivakumaran S > On 06-Aug-2016, at 6:18 PM, Luciano Resende wrote: > > Apache Bahir is voting it's 2.0.0 release based on Apache Spark 2.0.0. > > https://www.mail-archive.com/dev@bahir.apache.org/msg00312.html >