Re: Graphx triplet comparison

2016-12-13 Thread balaji9058
Hi Thanks for reply. Here is my code: class BusStopNode(val name: String,val mode:String,val maxpasengers :Int) extends Serializable case class busstop(override val name: String,override val mode:String,val shelterId: String, override val maxpasengers :Int) extends

Re: Streaming Batch Oddities

2016-12-13 Thread Bryan Jeffrey
All, Any thoughts? I can run another couple of experiments to try to narrow the problem. The total data volume in the repartition is around 60GB / batch. Regards, Bryan Jeffrey On Tue, Dec 13, 2016 at 12:11 PM, Bryan Jeffrey wrote: > Hello. > > I have a current

Belief propagation algorithm is open sourced

2016-12-13 Thread Ulanov, Alexander
Dear Spark developers and users, HPE has open sourced the implementation of the belief propagation (BP) algorithm for Apache Spark, a popular message passing algorithm for performing inference in probabilistic graphical models. It provides exact inference for graphical models without loops.

Parallel read from OracleDB slow, fails on large tables

2016-12-13 Thread epettijohn
I'm running the following code in an attempt to import some tables from our Oracle DB into Spark (2.0.2), and then save them as Parquet tables in S3 (using S3A). The code runs, and does create query-able tables in our Hive Metastore, but it only creates one connection to Oracle (I was expecting

Re: Third party library

2016-12-13 Thread vineet chadha
Thanks Jakob for sharing the link. Will try it out. Regards, Vineet On Tue, Dec 13, 2016 at 3:00 PM, Jakob Odersky wrote: > Hi Vineet, > great to see you solved the problem! Since this just appeared in my > inbox, I wanted to take the opportunity for a shameless plug: >

Re: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Ninad Shringarpure
Exactly what I was looking for. Thank you so much!! On Tue, Dec 13, 2016 at 6:15 PM Michael Armbrust wrote: > Yes > > >

Re: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Michael Armbrust
Yes https://databricks-prod-cloudfront.cloud.databricks.com/public/ 4027ec902e239c93eaaa8714f173bcfc/1023043053387187/4464261896877850/ 2840265927289860/latest.html On Tue, Dec 13, 2016 at 10:43 AM, Ninad Shringarpure wrote: > > Hi Team, > > Does Spark 2.0 support

Re: Third party library

2016-12-13 Thread Jakob Odersky
Hi Vineet, great to see you solved the problem! Since this just appeared in my inbox, I wanted to take the opportunity for a shameless plug: https://github.com/jodersky/sbt-jni. In case you're using sbt and also developing the native library, this plugin may help with the pains of building and

Re: Graphx triplet comparison

2016-12-13 Thread Robineast
No sure what you are asking. What's wrong with: triplet1.filter(condition3) triplet2.filter(condition3) - Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action -- View this message in context:

Re: About Spark Multiple Shared Context with Spark 2.0

2016-12-13 Thread Calvin Jia
Hi, Alluxio will allow you to share or cache data in-memory between different Spark contexts by storing RDDs or Dataframes as a file in the Alluxio system. The files can then be accessed by any Spark job like a file in any other distributed storage system. These two blogs do a good job of

Re: Third party library

2016-12-13 Thread vineet chadha
Thanks Steve and Kant. Apologies for late reply as I was out for vacation. Got it working. For other users: def loadResources() { System.loadLibrary("foolib") val MyInstance = new MyClass val retstr = MyInstance.foo("mystring") // method trying to invoke }

Fwd: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Ninad Shringarpure
Hi Team, Does Spark 2.0 support non-primitive types in collect_list for inserting nested collections? Would appreciate any references or samples. Thanks, Ninad

Re: Few questions on reliability of accumulators value.

2016-12-13 Thread Sudev A C
Thank you for the clarification. On Tue, Dec 13, 2016 at 1:27 AM Daniel Siegmann < dsiegm...@securityscorecard.io> wrote: > Accumulators are generally unreliable and should not be used. The answer > to (2) and (4) is yes. The answer to (3) is both. > > Here's a more in-depth explanation: >

Re: Where is yarn-shuffle.jar in maven?

2016-12-13 Thread dhruve ashar
Hi Neal, >From my understanding, the reason I think the shuffle is not available in maven central is that it is an external component and dependent on the cluster manager version - atleast in case of yarn. For example, it would require the appropriate hadoop profile based on your underlying

Re: Where is yarn-shuffle.jar in maven?

2016-12-13 Thread Marcelo Vanzin
https://mvnrepository.com/artifact/org.apache.spark/spark-network-yarn_2.11/2.0.2 On Mon, Dec 12, 2016 at 9:56 PM, Neal Yin wrote: > Hi, > > For dynamic allocation feature, I need spark-xxx-yarn-shuffle.jar. In my > local spark build, I can see it. But in maven central, I

Streaming Batch Oddities

2016-12-13 Thread Bryan Jeffrey
Hello. I have a current Spark 1.6.1 application that I am working to modify. The flow of the application looks something like the following: (Kafka) --> (Direct Stream Receiver) --> (Repartition) --> (Extract/Schemitization Logic w/ RangePartitioner) --> Several Output Operations In the

Output Side Effects for different chain of operations

2016-12-13 Thread Chawla,Sumit
Hi All I have a workflow with different steps in my program. Lets say these are steps A, B, C, D. Step B produces some temp files on each executor node. How can i add another step E which consumes these files? I understand the easiest choice is to copy all these temp files to any shared

About Spark Multiple Shared Context with Spark 2.0

2016-12-13 Thread Chetan Khatri
Hello Guys, What would be approach to accomplish Spark Multiple Shared Context without Alluxio and with with Alluxio , and what would be best practice to achieve parallelism and concurrency for spark jobs. Thanks. -- Yours Aye, Chetan Khatri. M.+91 7 80574 Data Science Researcher INDIA

How to set classpath for a job that submit to Mesos cluster

2016-12-13 Thread Chanh Le
Hi everyone, I have a job that read segment data from druid then convert to csv. When I run it in local mode it works fine. /home/airflow/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --driver-memory 1g --master "local[4]" --files /home/airflow/spark-jobs/forecast_jobs/prod.conf --conf