date:20160813

How Spark sql query optimisation work if we are using .rdd action ?

2016-08-13 Thread mayur bhole

HI All, Lets say, we have val df = bigTableA.join(bigTableB,bigTableA("A")===bigTableB("A"),"left") val rddFromDF = df.rdd println(rddFromDF.count) My understanding is that spark will convert all data frame operations before "rddFromDF.count" into RDD equivalent operation as we are not

Re: Why I can't use broadcast var defined in a global object?

2016-08-13 Thread Ted Yu

Can you (or David) resend David's reply ? I don't see the reply in this thread. Thanks > On Aug 13, 2016, at 8:39 PM, yaochunnan wrote: > > Hi David, > Your answers have solved my problem! Detailed and accurate. Thank you very > much! > > > > -- > View this message

Re: Why I can't use broadcast var defined in a global object?

2016-08-13 Thread yaochunnan

Hi David, Your answers have solved my problem! Detailed and accurate. Thank you very much! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-I-can-t-use-broadcast-var-defined-in-a-global-object-tp27523p27531.html Sent from the Apache Spark User List

Re: Does Spark SQL support indexes?

2016-08-13 Thread Chanh Le

Hi Taotao, Spark SQL doesn’t support index :). > On Aug 14, 2016, at 10:03 AM, Taotao.Li wrote: > > > hi, guys, does Spark SQL support indexes? if so, how can I create an index > on my temp table? if not, how can I handle some specific queries on a very > large

Does Spark SQL support indexes?

2016-08-13 Thread Taotao.Li

hi, guys, does Spark SQL support indexes? if so, how can I create an index on my temp table? if not, how can I handle some specific queries on a very large table? it would iterate all the table even though all I want is just a small piece of that table. great thanks, *___*

Re: KafkaUtils.createStream not picking smallest offset

2016-08-13 Thread Diwakar Dhanuskodi

Not using check pointing now. Source is producing 1.2million messages to topic. We are using zookeeper offsets for other downstreams too. That's the reason going with createstream which stores offsets in zookeeper. Sent from Samsung Mobile. Original message

Re: mesos or kubernetes ?

2016-08-13 Thread Jacek Laskowski

Hi, Thanks Michael! That's exactly what I missed in my understanding of the different options for Spark on XYZ. Thanks! And the last sentence was excellent to help me understand DC/OS to, say, CDH. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0

Re: [SQL] Why does (0 to 9).toDF("num").as[String] work?

2016-08-13 Thread Jacek Laskowski

Hi, The point is that I could go full-type with Dataset[String] and wonder why it's possible with ints. You're working with DataFrames which are Dataset[Row]. It's too little to me these days :) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0

Re: mesos or kubernetes ?

2016-08-13 Thread Michael Gummelt

DC/OS Spark *is* Apache Spark on Mesos, along with some packaging that makes it easy to install and manage on DC/OS. For example: $ dcos package install spark $ dcos spark run --submit-args="--class SparkPi ..." The single command install gives runs the cluster dispatcher and the history server

Re: [SQL] Why does (0 to 9).toDF("num").as[String] work?

2016-08-13 Thread Mich Talebzadeh

Would not that be as simple as: scala> (0 to 9).toDF res14: org.apache.spark.sql.DataFrame = [value: int] scala> (0 to 9).toDF.map(_.toString) res13: org.apache.spark.sql.Dataset[String] = [value: string] with my little knowledge Dr Mich Talebzadeh LinkedIn *

Re: Spark 2.0.0 - Java API - Modify a column in a dataframe

2016-08-13 Thread Jacek Laskowski

Hi, Could Encoders.STRING work? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Aug 11, 2016 at 5:28 AM, Aseem Bansal wrote: > Hi

Re: mesos or kubernetes ?

2016-08-13 Thread Jacek Laskowski

Hi, I'm wondering why not DC/OS (with Mesos)? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Aug 13, 2016 at 11:24 AM, guyoh wrote: >

[SQL] Why does (0 to 9).toDF("num").as[String] work?

2016-08-13 Thread Jacek Laskowski

Hi, Just ran into it and can't explain why it works. Please help me understand it. Q1: Why can I `as[String]` with Ints? Is this type safe? scala> (0 to 9).toDF("num").as[String] res12: org.apache.spark.sql.Dataset[String] = [num: int] Q2: Why can I map over strings even though there are

Re: mesos or kubernetes ?

2016-08-13 Thread Shuai Lin

Good summary! One more advantage of running spark on mesos: community support. There are quite a big user base that runs spark on mesos, so if you encounter a problem with your deployment, it's very likely you can get the answer by a simple google search, or asking in the spark/mesos user list. By

Re: mesos or kubernetes ?

2016-08-13 Thread Michael Gummelt

Spark has a first-class scheduler for Mesos, whereas it doesn't for Kubernetes. Running Spark on Kubernetes means running Spark in standalone mode, wrapped in a Kubernetes service: https://github.com/kubernetes/kubernetes/tree/master/examples/spark So you're effectively comparing standalone vs.

mesos or kubernetes ?

2016-08-13 Thread guyoh

My company is trying to decide whether to use kubernetes or mesos. Since we are planning to use Spark in the near future, I was wandering what is the best choice for us. Thanks, Guy -- View this message in context:

Re: call a mysql stored procedure from spark

2016-08-13 Thread Mich Talebzadeh

to be executed in MySQL and results sent back to Spark? No I don't think so. On the other hand a stored procedure is nothing but a compiled code so can you use the raw SQL behind the stored proc? You can certainly send the SQL via JDBC and the RS back. HTH Dr Mich Talebzadeh LinkedIn *

call a mysql stored procedure from spark

2016-08-13 Thread sujeet jog

Hi, Is there a way to call a stored procedure using spark ? thanks, Sujeet

Re: Accessing HBase through Spark with Security enabled

2016-08-13 Thread Jacek Laskowski

Hi Aneela, My (little to no) understanding of how to make it work is to use hbase.security.authentication property set to kerberos (see [1]). Spark on YARN uses it to get the tokens for Hive, HBase et al (see [2]). It happens when Client starts conversation to YARN RM (see [3]). You should not

Spark stage concurrency

2016-08-13 Thread Mazen

Suppose a spark job has two stages with independent dependencies (they do not depend on each other) and they are submitted concurrently/simultaneously (as Tasksets) by the DAG scheduler to the task scheduler. Can someone give more detailed insight on how the cores available on executors are

Spark Streaming fault tolerance benchmark

2016-08-13 Thread Dominik Safaric

A few months ago, I've started investigating part of an empirical research several stream processing engines, including but not limited to Spark Streaming. As the benchmark should extend the scope further from performance metrics such as throughput and latency, I've focused onto fault tolerance

Re: Spark 2 cannot create ORC table when CLUSTERED. This worked in Spark 1.6.1

2016-08-13 Thread Mich Talebzadeh

Hi, SPARK-17047 created Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Unsubscribe

2016-08-13 Thread bijuna

Unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Accessing HBase through Spark with Security enabled

2016-08-13 Thread Aneela Saleem

Thanks for your response Jacek! Here is the code, how spark accesses HBase: System.setProperty("java.security.krb5.conf", "/etc/krb5.conf"); System.setProperty("java.security.auth.login.config", "/etc/hbase/conf/zk-jaas.conf"); val hconf = HBaseConfiguration.create() val tableName = "emp"

How Spark sql query optimisation work if we are using .rdd action ?

Re: Why I can't use broadcast var defined in a global object?

Re: Why I can't use broadcast var defined in a global object?

Re: Does Spark SQL support indexes?

Does Spark SQL support indexes?

Re: KafkaUtils.createStream not picking smallest offset

Re: mesos or kubernetes ?

Re: [SQL] Why does (0 to 9).toDF("num").as[String] work?

Re: mesos or kubernetes ?

Re: [SQL] Why does (0 to 9).toDF("num").as[String] work?

Re: Spark 2.0.0 - Java API - Modify a column in a dataframe

Re: mesos or kubernetes ?

[SQL] Why does (0 to 9).toDF("num").as[String] work?

Re: mesos or kubernetes ?

Re: mesos or kubernetes ?

mesos or kubernetes ?

Re: call a mysql stored procedure from spark

call a mysql stored procedure from spark

Re: Accessing HBase through Spark with Security enabled

Spark stage concurrency

Spark Streaming fault tolerance benchmark

Re: Spark 2 cannot create ORC table when CLUSTERED. This worked in Spark 1.6.1

Unsubscribe

Re: Accessing HBase through Spark with Security enabled

24 matches

Site Navigation

Mail list logo

Footer information