NICE!
Thanks Brandon
Andy.
From: Brandon Geise
Date: Friday, March 30, 2018 at 6:15 PM
To: Andrew Davidson , "user @spark"
Subject: Re: how to create all possible combinations from an array? how to
join and
What's wrong just using a UDF doing for loop in scala? You can change the for
loop logic for what combination you want.
scala> spark.version
res4: String = 2.2.1
scala> aggDS.printSchema
root
|-- name: string (nullable = true)
|-- colors: array (nullable = true)
||-- element: string
Possibly instead of doing the initial grouping, just do a full outer join on
zyzy. This is in scala but should be easily convertible to python.
val data = Array(("john", "red"), ("john", "blue"), ("john", "red"), ("bill",
"blue"), ("bill", "red"), ("sam", "green"))
val distData:
I was a little sloppy when I created the sample output. Its missing a few
pairs
Assume for a given row I have [a, b, c] I want to create something like the
cartesian join
From: Andrew Davidson
Date: Friday, March 30, 2018 at 5:54 PM
To: "user @spark"
I have a dataframe and execute df.groupBy(³xyzy²).agg( collect_list(³abc²)
This produces a column of type array. Now for each row I want to create a
multiple pairs/tuples from the array so that I can create a contingency
table. Any idea how I can transform my data so that call crosstab() ? The
Hi,
I have a spark job which needs to access HBase inside a mapToPair function. The
question is that I do not want to connect to HBase and close connection
each time.
As I understand, PairFunction is not designed to manage resources with
setup() and close(), like Hadoop reader and writer.
Does
thanks i will check our SparkSubmit class
On Fri, Mar 30, 2018 at 2:46 PM, Marcelo Vanzin wrote:
> Why: it's part historical, part "how else would you do it".
>
> SparkConf needs to read properties read from the command line, but
> SparkConf is something that user code
Why: it's part historical, part "how else would you do it".
SparkConf needs to read properties read from the command line, but
SparkConf is something that user code instantiates, so we can't easily
make it read data from arbitrary locations. You could use thread
locals and other tricks, but user
does anyone know why all spark settings end up being system properties, and
where this is done?
for example when i pass "--conf spark.foo=bar" into spark-submit then
System.getProperty("spark.foo") will be equal to "bar"
i grepped the spark codebase for System.setProperty or System.setProperties
Hi All,
I'm working on simple structured streaming query that
uses flatMapGroupsWithState to maintain relatively a large size state.
After running the application for few minutes on my local machine, it
starts to slow down and then crashes with OutOfMemoryError.
Tracking the code led me to
My executor will be OOM when use spark-sql to read data from Mysql.
In
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala,
I see the following lines.I'm wandering why JDBC_BATCH_FETCH_SIZE should be
bigger than 0?
val fetchSize = {
val size =
11 matches
Mail list logo