Re: What are using Spark for

2016-08-01 Thread Rodrick Brown
Each of our micro services logs events to Kafka topic, we then use spark to consume messages from that queue and write it into elastic search. The data from ES is used by a number of support applications graphs, monitoring, reports, dash boards for client services teams etc.. --

Re: What are using Spark for

2016-08-01 Thread Xiao Li
Hi, Rohit, The Spark summit has many interesting use cases. Hopefully, it can answer your question. https://spark-summit.org/2015/schedule/ https://spark-summit.org/2016/schedule/ Thanks, Xiao 2016-08-01 22:48 GMT-07:00 Rohit L : > Hi Everyone, > > I want to know

Spark GraphFrames

2016-08-01 Thread Divya Gehlot
Hi, Has anybody has worked with GraphFrames. Pls let me know as I need to know the real case scenarios where It can used . Thanks, Divya

What are using Spark for

2016-08-01 Thread Rohit L
Hi Everyone, I want to know the real world uses cases for which Spark is used and hence can you please share for what purpose you are using Apache Spark in your project? -- Rohit

Re: Sqoop On Spark

2016-08-01 Thread Takeshi Yamamuro
Hi, Have you seen this previous thread? https://www.mail-archive.com/user@spark.apache.org/msg49025.html I'm not sure this is what you want though. // maropu On Tue, Aug 2, 2016 at 1:52 PM, Selvam Raman wrote: > Hi Team, > > how can i use spark as execution engine in

Sqoop On Spark

2016-08-01 Thread Selvam Raman
Hi Team, how can i use spark as execution engine in sqoop2. i see the patch(S QOOP-1532 ) but it shows in progess. so can not we use sqoop on spark. Please help me if you have an any idea. -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: [MLlib] Term Frequency in TF-IDF seems incorrect

2016-08-01 Thread Yanbo Liang
Hi Hao, HashingTF directly apply a hash function (Murmurhash3) to the features to determine their column index. It excluded any thought about the term frequency or the length of the document. It does similar work compared with sklearn FeatureHasher. The result is increased speed and reduced

unsubscribe

2016-08-01 Thread zhangjp
unsubscribe

Re: sql to spark scala rdd

2016-08-01 Thread sri hari kali charan Tummala
Hi All, Below code calculates cumulative sum (running sum) and moving average using scala RDD type of programming, I was using wrong function which is sliding use scalleft instead. sc.textFile("C:\\Users\\kalit_000\\Desktop\\Hadoop_IMP_DOC\\spark\\data.txt") .map(x => x.split("\\~")) .map(x

Re: tpcds for spark2.0

2016-08-01 Thread kevin
finally I use spark-sql-perf-0.4.3 : ./bin/spark-shell --jars /home/dcos/spark-sql-perf-0.4.3/target/scala-2.11/spark-sql-perf_2.11-0.4.3.jar --executor-cores 4 --executor-memory 10G --master spark://master1:7077 If I don't use "--jars" I will get error what I mentioned. 2016-07-29 21:17

Re: spark.read.format("jdbc")

2016-08-01 Thread kevin
I was fix it by : val jdbcDF = spark.read.format("org.apache.spark.sql.execution.datasources.jdbc.DefaultSource") .options(Map("url" -> s"jdbc:mysql://${mysqlhost}:3306/test", "driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "i_user", "user" -> "root", "password" -> "123456")) .load()

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-08-01 Thread Steve Rowe
UnaryTransformer’s scaladoc says "Abstract class for transformers that take one input column, apply transformation, and output the result as a new column.” If you want to allow specification of more than one input column, or if your output column already exists, or you want multiple output

Re: The equivalent for INSTR in Spark FP

2016-08-01 Thread Mich Talebzadeh
Thanks Jacek. It sounds like the issue the position of the second variable in substring() This works scala> val wSpec2 = Window.partitionBy(substring($"transactiondescription",1,20)) wSpec2: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@1a4eae2 Using

[MLlib] Term Frequency in TF-IDF seems incorrect

2016-08-01 Thread Hao Ren
When computing term frequency, we can use either HashTF or CountVectorizer feature extractors. However, both of them just use the number of times that a term appears in a document. It is not a true frequency. Acutally, it should be divided by the length of the document. Is this a wanted feature ?

Re: Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-01 Thread Holden Karau
Thats a good point - there is an open issue for spark-testing-base to support this shared sparksession approach - but I haven't had the time ( https://github.com/holdenk/spark-testing-base/issues/123 ). I'll try and include this in the next release :) On Mon, Aug 1, 2016 at 9:22 AM, Koert Kuipers

Re: The equivalent for INSTR in Spark FP

2016-08-01 Thread Jacek Laskowski
Hi, Interesting... I'm temping to think that substring function should accept the columns that hold the numbers for start and end. I'd love hearing people's thought on this. For now, I'd say you need to define udf to do substring as follows: scala> val mySubstr = udf { (s: String, start: Int,

Re: Problems initializing SparkUI

2016-08-01 Thread Mich Talebzadeh
sounds strange. What happens if you submit the job from the other node? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Problems initializing SparkUI

2016-08-01 Thread Maximiliano Patricio Méndez
Hi, the output of the host where the driver is running: ~$ jps 9895 DriverWrapper 24057 Jps 3531 Worker In that host, I gave too much memory for the driver and no executor could be place for that worker. 2016-08-01 18:06 GMT-03:00 Mich Talebzadeh : > OK > > Can you

Re: The equivalent for INSTR in Spark FP

2016-08-01 Thread Mich Talebzadeh
Thanks Jacek, Do I have any other way of writing this with functional programming? select substring(transactiondescription,1,INSTR(transactiondescription,' CD')-2), Cheers, Dr Mich Talebzadeh LinkedIn *

Re: The equivalent for INSTR in Spark FP

2016-08-01 Thread Jacek Laskowski
Hi Mich, There's no indexOf UDF - http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at

Spark 2.0 History Server Storage

2016-08-01 Thread Andrei Ivanov
Hi all, I've just tried upgrading Spark to 2.0 and so far it looks generally good. But there is at least one issue I see right away - jon histories are missing storage information (persisted RRDs). This info is also missing from pre upgrade jobs. Does anyone have a clue what can be wrong?

Re: Problems initializing SparkUI

2016-08-01 Thread Mich Talebzadeh
OK Can you on the hostname that driver program is running do jps and send the output please? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Problems initializing SparkUI

2016-08-01 Thread Maximiliano Patricio Méndez
Hi, thanks again for the answer. Looking a little bit closer into that, I found out that the DriverWrapper process was not running in the hostname the log reported. It is runnning, but in another host. Mistery. If I manually go to the host that has the DriverWrapper running in it, on port 4040,

Re: sampling operation for DStream

2016-08-01 Thread Cody Koeninger
Put the queue in a static variable that is first referenced on the workers (inside an rdd closure). That way it will be created on each of the workers, not the driver. Easiest way to do that is with a lazy val in a companion object. On Mon, Aug 1, 2016 at 3:22 PM, Martin Le

Re: Problems initializing SparkUI

2016-08-01 Thread Mich Talebzadeh
Fine. In that case which process is your driver program (from jps output)? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: The equivalent for INSTR in Spark FP

2016-08-01 Thread Mich Talebzadeh
Any programming expert who can shed some light on this? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Problems initializing SparkUI

2016-08-01 Thread Maximiliano Patricio Méndez
Hi, MIch, thanks for replying. I'm deploying from the same instance from where I showed the logs and commands using --deploy-mode cluster. The SparkSubmit process only appears while the bin/spark-submit binary is active. When the application starts and the driver takes control, the SparkSubmit

Re: sampling operation for DStream

2016-08-01 Thread Martin Le
How to do that? if I put the queue inside .transform operation, it doesn't work. On Mon, Aug 1, 2016 at 6:43 PM, Cody Koeninger wrote: > Can you keep a queue per executor in memory? > > On Mon, Aug 1, 2016 at 11:24 AM, Martin Le > wrote: > > Hi Cody

Re: Java Recipes for Spark

2016-08-01 Thread Gourav Sengupta
JAVA? AGAIN? I am getting into serious depression Regards, Gourav On Mon, Aug 1, 2016 at 9:03 PM, Marco Mistroni wrote: > Hi jg > +1 for link. I'd add ML and graph examples if u can > -1 for programmign language choice :)) > > > kr > > On 31 Jul 2016 9:13 pm,

Re: Java Recipes for Spark

2016-08-01 Thread Marco Mistroni
Hi jg +1 for link. I'd add ML and graph examples if u can -1 for programmign language choice :)) kr On 31 Jul 2016 9:13 pm, "Jean Georges Perrin" wrote: > Thanks Guys - I really appreciate :)... If you have any idea of something > missing, I'll gladly add it. > > (and

Mlib RandomForest (Spark 2.0) predict a single vector

2016-08-01 Thread itai.efrati
Hello, After training a RandomForestRegressor in PipelineModel using mlib and DataFrame (Spark 2.0) I loaded the saved model into my RT environment in order to predict using the model, each request is handled and transform through the loaded PipelineModel but in the process I had to convert the

Re: SQL predicate pushdown on parquet or other columnar formats

2016-08-01 Thread Mich Talebzadeh
Hi, You mentioned: In general, is this optimization done for all columnar databases or file formats ? Have you tried it using an ORC file? That is another columnar table/file. Spark follows a rule based optimizer. It does not have a cost based optimizer yet! It is planned for future I believe

Re: Problems initializing SparkUI

2016-08-01 Thread Mich Talebzadeh
OK I can see the Worker (19286 Worker and the executor(6548 CoarseGrainedExecutorBackend) running on it Where is spark-submit? Did you submit your job from another node or used another method to run it? HTH Dr Mich Talebzadeh LinkedIn *

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-08-01 Thread janardhan shetty
What is the difference between UnaryTransformer and Transformer classes. In which scenarios should we use one or the other ? On Sun, Jul 31, 2016 at 8:27 PM, janardhan shetty wrote: > Developing in scala but any help with difference between UnaryTransformer > (Is this

SQL predicate pushdown on parquet or other columnar formats

2016-08-01 Thread Sandeep Joshi
Hi I just want to confirm my understanding of the physical plan generated by Spark SQL while reading from a Parquet file. When multiple predicates are pushed to the PrunedFilterScan, does Spark ensure that the Parquet file is not read multiple times while evaluating each predicate ? In general,

Re: Problems initializing SparkUI

2016-08-01 Thread Maximiliano Patricio Méndez
I just recently tried again, the port 4040 is not used. And even if it were, I think the log would reflect that trying to use the following port (4041) as you mentioned. This is what the driver log says: 16/08/01 13:55:56 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/08/01

The equivalent for INSTR in Spark FP

2016-08-01 Thread Mich Talebzadeh
Hi, What is the equivalent of FP for the following window/analytic that works OK in Spark SQL This one using INSTR select substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2), select distinct * from ( select

python 'Jupyter' data frame problem with autocompletion

2016-08-01 Thread Andy Davidson
I started using python3 and jupyter in a chrome browser. I seem to be having trouble with data frame code completion. Regular python functions seems to work correctly. I wonder if I need to import something so the notebook knows about data frames? Kind regards Andy

Re: Tuning level of Parallelism: Increase or decrease?

2016-08-01 Thread Nikolay Zhebet
Yes, Spark always trying to deliver snippet of code to the data (not vice versa). But you should realize, that if you try to run groupBY or Join on the large dataset, then you always should migrate temporary localy grouped data from one worker node to the another(It is shuffle operation as i

Re: Problems initializing SparkUI

2016-08-01 Thread Mich Talebzadeh
Can you check if port 4040 is actually used? If it used the next available one would 4041. For example below Zeppelin uses it *netstat -plten|grep 4040*tcp0 0 :::4040 :::*LISTEN 1005 73372882 *10699*/java *ps aux|grep 10699* hduser 10699 0.1

Re: Problems initializing SparkUI

2016-08-01 Thread Maximiliano Patricio Méndez
Hi, Thanks for the answers. @Jacek: To verify if the ui is up, I enter to all the worker nodes of my cluster and run netstat -nltp | grep 4040 with no result. The log of the driver tells me in which server and on which port should the spark ui be up, but it isn't. @Mich: I've tried to specify

Re: spark 2.0 readStream from a REST API

2016-08-01 Thread Amit Sela
I think you're missing: val query = wordCounts.writeStream .outputMode("complete") .format("console") .start() Dis it help ? On Mon, Aug 1, 2016 at 2:44 PM Jacek Laskowski wrote: > On Mon, Aug 1, 2016 at 11:01 AM, Ayoub Benali > wrote: > >

Re: sampling operation for DStream

2016-08-01 Thread Cody Koeninger
Can you keep a queue per executor in memory? On Mon, Aug 1, 2016 at 11:24 AM, Martin Le wrote: > Hi Cody and all, > > Thank you for your answer. I implement simple random sampling (SRS) for > DStream using transform method, and it works fine. > However, I have a problem

Re: Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-01 Thread Koert Kuipers
we share a single single sparksession across tests, and they can run in parallel. is pretty fast On Mon, Aug 1, 2016 at 12:02 PM, Everett Anderson wrote: > Hi, > > Right now, if any code uses DataFrame/Dataset, I need a test setup that > brings up a local master as in

Re: sampling operation for DStream

2016-08-01 Thread Martin Le
Hi Cody and all, Thank you for your answer. I implement simple random sampling (SRS) for DStream using transform method, and it works fine. However, I have a problem when I implement reservoir sampling (RS). In RS, I need to maintain a reservoir (a queue) to store selected data items (RDDs). If I

Re: Tuning level of Parallelism: Increase or decrease?

2016-08-01 Thread Jestin Ma
Hi Nikolay, I'm looking at data locality improvements for Spark, and I have conflicting sources on using YARN for Spark. Reynold said that Spark workers automatically take care of data locality here:

Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-01 Thread Everett Anderson
Hi, Right now, if any code uses DataFrame/Dataset, I need a test setup that brings up a local master as in this article . That's a lot of overhead for unit testing and the tests can't run in

Re: Possible to push sub-queries down into the DataSource impl?

2016-08-01 Thread Timothy Potter
yes, that's exactly what I was looking for, thanks for the pointer ;-) On Thu, Jul 28, 2016 at 1:07 AM, Takeshi Yamamuro wrote: > Hi, > > Have you seen this ticket? > https://issues.apache.org/jira/browse/SPARK-12449 > > // maropu > > On Thu, Jul 28, 2016 at 2:13 AM,

Spark 2 and Solr

2016-08-01 Thread Andrés Ivaldi
Hello, does any one know if Spark 2.0 will have a Solr connector? Lucidworks has one but is not available yet for Spark 2.0 thanks!!

Need Advice: Spark-Streaming Setup

2016-08-01 Thread David Kaufman
Hi, I'm currently working on Spark, HBase-Setup which processes log files (~10 GB/day). These log files are persisted hourly on n > 10 application servers and copied to a 4 node hdfs. Our current spark-job aggregates single visits (based on a session-uuid) across all application-servers on a

Re: Windows operation orderBy desc

2016-08-01 Thread Mich Talebzadeh
You need to get the position right val wSpec = Window.partitionBy("col1").orderBy(desc("col2")) HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Windows operation orderBy desc

2016-08-01 Thread Ashok Kumar
Hi, in the following Window spec I want orderBy ("") to be displayed in descending order please val W = Window.partitionBy("col1").orderBy("col2") If I Do val W = Window.partitionBy("col1").orderBy("col2".desc) It throws error console>:26: error: value desc is not a member of String How can I

Re: Windows - Spark 2 - Standalone - Worker not able to connect to Master

2016-08-01 Thread Nikolay Zhebet
Your exception says, that you have connection trouble with Spark master. Check if it is available from your environment where you trying to run job. In Linux system for this can be suitable this commands: "telnet 127.0.0.1 7077" or "netstat -ntpl | grep 7077" or "nmap 127.0.0.1 | grep 7077".

Re: spark 2.0 readStream from a REST API

2016-08-01 Thread Jacek Laskowski
On Mon, Aug 1, 2016 at 11:01 AM, Ayoub Benali wrote: > the problem now is that when I consume the dataframe for example with count > I get the stack trace below. Mind sharing the entire pipeline? > I followed the implementation of TextSocketSourceProvider to

Re: Windows - Spark 2 - Standalone - Worker not able to connect to Master

2016-08-01 Thread ayan guha
No I confirmed master is running by spark ui at localhost:8080 On 1 Aug 2016 18:22, "Nikolay Zhebet" wrote: > I think you haven't run spark master yet, or maybe port 7077 is not yours > default port for spark master. > > 2016-08-01 4:24 GMT+03:00 ayan guha

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-08-01 Thread Ted Yu
Have you seen the following ? http://stackoverflow.com/questions/27553547/xloggc-not-creating-log-file-if-path-doesnt-exist-for-the-first-time On Sat, Jul 23, 2016 at 5:18 PM, Ascot Moss wrote: > I tried to add -Xloggc:./jvm_gc.log > > --conf

Testing --supervise flag

2016-08-01 Thread Noorul Islam K M
Hi all, I was trying to test --supervise flag of spark-submit. The documentation [1] says that, the flag helps in restarting your application automatically if it exited with non-zero exit code. I am looking for some clarification on that documentation. In this context, does application means

Re: JettyUtils.createServletHandler Method not Found?

2016-08-01 Thread Ted Yu
Original discussion was about Spark 1.3 Which Spark release are you using ? Cheers On Mon, Aug 1, 2016 at 1:37 AM, bg_spark <1412743...@qq.com> wrote: > hello,I have the same problem like you, how do you solve the problem? > > > > -- > View this message in context: >

java.net.UnknownHostException

2016-08-01 Thread pseudo oduesp
hi i get the following erreors when i try using pyspark 2.0 with ipython on yarn somone can help me please . java.lang.IllegalArgumentException: java.net.UnknownHostException: s001.bigdata.;s003.bigdata;s008bigdata. at

Re: multiple spark streaming contexts

2016-08-01 Thread Nikolay Zhebet
You always can save data in hdfs where you need, and you can controll paralelizm in your app by configuring --driver-cores and --driver-memory.This approach can maintain Spark master and it can controll your failure issues, data locality and etc. But if you want to controll it by self with

Re: spark 2.0 readStream from a REST API

2016-08-01 Thread Ayoub Benali
Hello, using the full class name worked, thanks. the problem now is that when I consume the dataframe for example with count I get the stack trace below. I followed the implementation of TextSocketSourceProvider

Re: Getting error, when I do df.show()

2016-08-01 Thread Saisai Shao
> > java.lang.NoClassDefFoundError: spray/json/JsonReader > > at > com.memsql.spark.pushdown.MemSQLPhysicalRDD$.fromAbstractQueryTree(MemSQLPhysicalRDD.scala:95) > > at > com.memsql.spark.pushdown.MemSQLPushdownStrategy.apply(MemSQLPushdownStrategy.scala:49) >

Re: JettyUtils.createServletHandler Method not Found?

2016-08-01 Thread bg_spark
hello,I have the same problem like you, how do you solve the problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JettyUtils-createServletHandler-Method-not-Found-tp22262p27446.html Sent from the Apache Spark User List mailing list archive at

Re: Windows - Spark 2 - Standalone - Worker not able to connect to Master

2016-08-01 Thread Nikolay Zhebet
I think you haven't run spark master yet, or maybe port 7077 is not yours default port for spark master. 2016-08-01 4:24 GMT+03:00 ayan guha : > Hi > > I just downloaded Spark 2.0 on my windows 7 to check it out. However, not > able to set up a standalone cluster: > > > Step

Getting error, when I do df.show()

2016-08-01 Thread Subhajit Purkayastha
I am getting this error in the spark-shell when I do . Which jar file I need to download to fix this error? Df.show() Error scala> val df = msc.sql(query) df: org.apache.spark.sql.DataFrame = [id: int, name: string] scala> df.show() java.lang.NoClassDefFoundError:

Re: Tuning level of Parallelism: Increase or decrease?

2016-08-01 Thread Nikolay Zhebet
Hi. Maybe you can help "data locality".. If you use groupBY and joins, than most likely you will see alot of network operations. This can be werry slow. You can try prepare, transform your information in that way, what can minimize transporting temporary information between worker-nodes. Try

Re: multiple spark streaming contexts

2016-08-01 Thread Sumit Khanna
Hey Nikolay, I know the approach, but this pretty much doesnt fit the bill for my usecase wherein each topic needs to be logged / persisted as a separate hdfs location. I am looking for something where a streaming context pertains to a topic and that topic only, and was wondering if I could have

Re: multiple spark streaming contexts

2016-08-01 Thread Nikolay Zhebet
Hi, If you want read several kafka topics in spark-streaming job, you can set names of topics splited by coma and after that you can read all messages from all topics in one flow: val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap val lines = KafkaUtils.createStream[String,

Re: spark.read.format("jdbc")

2016-08-01 Thread Nikolay Zhebet
You should specify classpath for your jdbc connection. As example, if you want connect to Impala, you can try it snippet: import java.util.Properties import org.apache.spark._ import org.apache.spark.sql.SQLContext import java.sql.Connection import java.sql.DriverManager

Re: sql to spark scala rdd

2016-08-01 Thread Sri
Hi , I solved it using spark SQL which uses similar window functions mentioned below , for my own knowledge I am trying to solve using Scala RDD which I am unable to. What function in Scala supports window function like SQL unbounded preceding and current row ? Is it sliding ? Thanks Sri

Re: multiple spark streaming contexts

2016-08-01 Thread Sumit Khanna
Any ideas guys? What are the best practices for multiple streams to be processed? I could trace a few Stack overflow comments wherein they better recommend a jar separate for each stream / use case. But that isn't pretty much what I want, as in it's better if one / multiple spark streaming

Re: spark.read.format("jdbc")

2016-08-01 Thread kevin
maybe there is another version spark on the classpath? 2016-08-01 14:30 GMT+08:00 kevin : > hi,all: >I try to load data from jdbc datasource,but I got error with : > java.lang.RuntimeException: Multiple sources found for jdbc >

spark.read.format("jdbc")

2016-08-01 Thread kevin
hi,all: I try to load data from jdbc datasource,but I got error with : java.lang.RuntimeException: Multiple sources found for jdbc (org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider, org.apache.spark.sql.execution.datasources.jdbc.DefaultSource), please specify the fully

Re: sql to spark scala rdd

2016-08-01 Thread Mich Talebzadeh
hi You mentioned: I already solved it using DF and spark sql ... Are you referring to this code which is a classic analytics: SELECT DATE,balance, SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) daily_balance FROM table So how did you solve it using

Re: sql to spark scala rdd

2016-08-01 Thread Sri
Hi , Just wondering how spark SQL works behind the scenes does it not convert SQL to some Scala RDD ? Or Scala ? How to write below SQL in Scala or Scala RDD SELECT DATE,balance, SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT