I think the tricky part here is that the join condition is encoded in the
second data frame and not a direct value.
Assuming the second data frame (the tags) is small enough, you can collect it
(read it into memory) and then construct a "when" expression chain for each of
the possible tags ,
Hi Satish,
Take a look at the smvTopNRecs() function in the SMV package. It does
exactly what you are looking for. It might be overkill to bring in all of SMV
for just one function but you will also get a lot more than just DF helper
functions (modular views, higher level graphs, dynamic
lly appreciate the help.
>
>
> Thanks,
> Divya
>
>
>
>
>
> On 3 February 2016 at 11:42, Ali Tajeldin EDU <alitedu1...@gmail.com> wrote:
> While you can construct the SQL string dynamically in scala/java/python, it
> would be best to use the Datafr
While you can construct the SQL string dynamically in scala/java/python, it
would be best to use the Dataframe API for creating dynamic SQL queries. See
http://spark.apache.org/docs/1.5.2/sql-programming-guide.html for details.
On Feb 2, 2016, at 6:49 PM, Divya Gehlot
Hi Alexander,
We developed SMV to address the exact issue you mentioned. While it is not a
workflow engine per-se, It does allow for the creation of modules with
dependency and automates the execution of these modules. See
Checkout the Sameer Farooqui video on youtube for spark internals
(https://www.youtube.com/watch?v=7ooZ4S7Ay6Y=PLIxzgeMkSrQ-2Uizm4l0HjNSSy2NxgqjX)
Starting at 2:15:00, he describes YARN mode.
btw, highly recommend the entire video. Very detailed and concise.
--
Ali
On Dec 7, 2015, at 8:38
You can try to run "jstack" a couple of times while the app is hung to look for
patterns for where the app is hung.
--
Ali
On Dec 3, 2015, at 8:27 AM, Richard Marscher wrote:
> I should add that the pauses are not from GC and also in tracing the CPU call
> tree in
I'm not %100 sure, but I don't think a jar within a jar will work without a
custom class loader. You can perhaps try to use "maven-assembly-plugin" or
"maven-shade-plugin" to build your uber/fat jar. Both of these will build a
flattened single jar.
--
Ali
On Nov 26, 2015, at 2:49 AM, Marc de
make sure
"/mnt/local/1024gbxvdf1/all_adleads_cleaned_commas_in_quotes_good_file.csv" is
accessible on your slave node.
--
Ali
On Nov 9, 2015, at 6:06 PM, Sanjay Subramanian
wrote:
> hey guys
>
> I have a 2 node SparkR (1 master 1 slave)cluster on AWS
You can take a look at the smvPivot function in the SMV library (
https://github.com/TresAmigosSD/SMV ). Should look for method "smvPivot" in
SmvDFHelper (
http://tresamigossd.github.io/SMV/scaladocs/index.html#org.tresamigos.smv.SmvDFHelper).
You can also perform the pivot on a
The Spark job-server project may help
(https://github.com/spark-jobserver/spark-jobserver).
--
Ali
On Oct 21, 2015, at 11:43 PM, ?? wrote:
> Hi developers, I've encountered some problem with Spark, and before opening
> an issue, I'd like to hear your thoughts.
>
Furthermore, even adding aliasing as suggested by the warning doesn't seem to
help either. Slight modification to example below:
> scala> val largeValues = df.filter('value >= 10).as("lv")
And just looking at the join results:
> scala> val j = smallValues
> .join(largeValues,
Which version of Spark are you using? I just tried the example below on 1.5.1
and it seems to work as expected:
scala> val res = df.groupBy("key").count.agg(min("count"), avg("count"))
res: org.apache.spark.sql.DataFrame = [min(count): bigint, avg(count): double]
scala> res.show
I could be misreading the code, but looking at the code for toLocalIterator
(copied below), it should lazily call runJob on each partition in your input.
It shouldn't be parsing the entire RDD before returning from the first "next"
call. If it is taking a long time on the first "next" call, it
Since DF2 only has the userID, I'm assuming you are musing DF2 to filter for
desired userIDs.
You can just use the join() and groupBy operations on DataFrame to do what you
desire. For example:
scala> val df1=app.createDF("id:String; v:Integer", "X,1;X,2;Y,3;Y,4;Z,10")
df1:
15 matches
Mail list logo