[Spark-GraphX] Conductance, Bridge Ratio & Diameter

2018-10-18 Thread Thodoris Zois
Hello, I am trying to compute conductance, bridge ratio and diameter on a given graph but I face some problems. - For the conductance my problem is how to compute the cuts so that they are kinda semi-clustered. Is the partitioningBy from GraphX related to dividing a graph into multiple

Mean over window with minimum number of rows

2018-10-18 Thread Sumona Routh
Hi all, Before I go the route of rolling my own UDAF: I'm doing a calculation of last 5 mean so I have the following window defined: Window.partitionBy(person).orderBy(timestamp).rowsBetween(-4, Window.currentRow) Then I calculate the mean over that window. Within each partition, I'd like the

Re: Spark In Memory Shuffle / 5403

2018-10-18 Thread Peter Liu
I would be very interested in the initial question here: is there a production level implementation for memory only shuffle and configurable (similar to MEMORY_ONLY storage level, MEMORY_OR_DISK storage level) as mentioned in this ticket, https://github.com/apache/spark/pull/5403 ? It would be

Encoding issue reading text file

2018-10-18 Thread Masf
Hi everyone, I´m trying to read a text file with UTF-16LE but I´m getting weird characters like this: �� W h e n My code is this one: sparkSession .read .format("text") .option("charset", "UTF-16LE") .load("textfile.txt") I´m using Spark 2.3.1. Any idea to fix

Re: Spark In Memory Shuffle

2018-10-18 Thread ☼ R Nair
Thanks..great info. Will try and let all know. Best On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester wrote: > create the ramdisk: > mount tmpfs /mnt/spark -t tmpfs -o size=2G > > then point spark.local.dir to the ramdisk, which depends on your > deployment strategy, for me it was through

Re: Unable to read multiple JSON.Gz File.

2018-10-18 Thread Mahender Sarangam
Hi Jyoti, We are using HDInsight Spark 2.2 . Is there any setting differences for latest version of cluster /mahender On 10/2/2018 1:48 PM, Jyoti Ranjan Mahapatra wrote: Hi Mahendar, Which version of spark and Hadoop are you using? I tried it on spark2.3.1 with Hadoop 2.7.3 and it works for

Re: Spark In Memory Shuffle

2018-10-18 Thread onmstester onmstester
create the ramdisk: mount tmpfs /mnt/spark -t tmpfs -o size=2G then point spark.local.dir to the ramdisk, which depends on your deployment strategy, for me it was through SparkConf object before passing it to SparkContext: conf.set("spark.local.dir","/mnt/spark") To validate that spark is

FP-Growth clarification for Market Basket Analysis

2018-10-18 Thread aditipatel
Hello everyone, We are working to develop an Also-Bought field using Spark FP-Growth. I use the transform function to find products that are sold together most often. When we use the transform function to determine consequents I was wondering, are the predictions order from most to least likely?