I understand that the following are equivalent
df.filter('account === "acct1")
sql("select * from tempTableName where account = 'acct1'")
But is Spark SQL "smart" to also push filter predicates down for the
initial load?
e.g.
sqlContext.read.jdbc(…).filter('account=== "acct1")
Hi
I'm looking for some benchmarks on joining data frames where most of the
data is in HDFS (e.g. in parquet) and some "reference" or "metadata" is
still in RDBMS. I am only looking at the very first join before any caching
happens, and I assume there will be loss of parallelization because
I'm on a road block trying to understand why Spark doesn't work for a
colleague of mine on his Windows 7 laptop.
I have pretty much the same setup and everything works fine.
I googled the error message and didn't get anything that resovled it.
Here is the exception message (after running spark
Hi Everyone!
I'm trying to understand how Spark's cache work.
Here is my naive understanding, please let me know if I'm missing something:
val rdd1 = sc.textFile(some data)
rdd.cache() //marks rdd as cached
val rdd2 = rdd1.filter(...)
val rdd3 = rdd1.map(...)
rdd2.saveAsTextFile(...)
.
Speed is an important issue but by no means everything in the real
world, and these are rarely mutually exclusive options in the OSS
world. This is a great piece of work, but I don't think it's some kind
of argument against distributed computing.
On Fri, Mar 27, 2015 at 6:32 PM, Eran Medan
, 2015 at 2:31 PM, Eran Medan ehrann.meh...@gmail.com
wrote:
Hi everyone,
I had a lot of questions today, sorry if I'm spamming the list, but I
thought it's better than posting all questions in one thread. Let me know
if I should throttle my posts ;)
Here is my question:
When I try
Remember that article that went viral on HN? (Where a guy showed how GraphX
/ Giraph / GraphLab / Spark have worse performance on a 128 cluster than on
a 1 thread machine? if not here is the article -
http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html)
Well as you may
Hi everyone,
I had a lot of questions today, sorry if I'm spamming the list, but I
thought it's better than posting all questions in one thread. Let me know
if I should throttle my posts ;)
Here is my question:
When I try to have a case class that has Any in it (e.g. I have a property
map and