date:20161213

Re: Graphx triplet comparison

2016-12-13 Thread balaji9058

Hi Thanks for reply. Here is my code: class BusStopNode(val name: String,val mode:String,val maxpasengers :Int) extends Serializable case class busstop(override val name: String,override val mode:String,val shelterId: String, override val maxpasengers :Int) extends

Re: Streaming Batch Oddities

2016-12-13 Thread Bryan Jeffrey

All, Any thoughts? I can run another couple of experiments to try to narrow the problem. The total data volume in the repartition is around 60GB / batch. Regards, Bryan Jeffrey On Tue, Dec 13, 2016 at 12:11 PM, Bryan Jeffrey wrote: > Hello. > > I have a current

Belief propagation algorithm is open sourced

2016-12-13 Thread Ulanov, Alexander

Dear Spark developers and users, HPE has open sourced the implementation of the belief propagation (BP) algorithm for Apache Spark, a popular message passing algorithm for performing inference in probabilistic graphical models. It provides exact inference for graphical models without loops.

Parallel read from OracleDB slow, fails on large tables

2016-12-13 Thread epettijohn

I'm running the following code in an attempt to import some tables from our Oracle DB into Spark (2.0.2), and then save them as Parquet tables in S3 (using S3A). The code runs, and does create query-able tables in our Hive Metastore, but it only creates one connection to Oracle (I was expecting

Re: Third party library

2016-12-13 Thread vineet chadha

Thanks Jakob for sharing the link. Will try it out. Regards, Vineet On Tue, Dec 13, 2016 at 3:00 PM, Jakob Odersky wrote: > Hi Vineet, > great to see you solved the problem! Since this just appeared in my > inbox, I wanted to take the opportunity for a shameless plug: >

Re: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Ninad Shringarpure

Exactly what I was looking for. Thank you so much!! On Tue, Dec 13, 2016 at 6:15 PM Michael Armbrust wrote: > Yes > > >

Re: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Michael Armbrust

Yes https://databricks-prod-cloudfront.cloud.databricks.com/public/ 4027ec902e239c93eaaa8714f173bcfc/1023043053387187/4464261896877850/ 2840265927289860/latest.html On Tue, Dec 13, 2016 at 10:43 AM, Ninad Shringarpure wrote: > > Hi Team, > > Does Spark 2.0 support

Re: Third party library

2016-12-13 Thread Jakob Odersky

Hi Vineet, great to see you solved the problem! Since this just appeared in my inbox, I wanted to take the opportunity for a shameless plug: https://github.com/jodersky/sbt-jni. In case you're using sbt and also developing the native library, this plugin may help with the pains of building and

Re: Graphx triplet comparison

2016-12-13 Thread Robineast

No sure what you are asking. What's wrong with: triplet1.filter(condition3) triplet2.filter(condition3) - Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action -- View this message in context:

Re: About Spark Multiple Shared Context with Spark 2.0

2016-12-13 Thread Calvin Jia

Hi, Alluxio will allow you to share or cache data in-memory between different Spark contexts by storing RDDs or Dataframes as a file in the Alluxio system. The files can then be accessed by any Spark job like a file in any other distributed storage system. These two blogs do a good job of

Re: Third party library

2016-12-13 Thread vineet chadha

Thanks Steve and Kant. Apologies for late reply as I was out for vacation. Got it working. For other users: def loadResources() { System.loadLibrary("foolib") val MyInstance = new MyClass val retstr = MyInstance.foo("mystring") // method trying to invoke }

Fwd: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Ninad Shringarpure

Hi Team, Does Spark 2.0 support non-primitive types in collect_list for inserting nested collections? Would appreciate any references or samples. Thanks, Ninad

Re: Few questions on reliability of accumulators value.

2016-12-13 Thread Sudev A C

Thank you for the clarification. On Tue, Dec 13, 2016 at 1:27 AM Daniel Siegmann < dsiegm...@securityscorecard.io> wrote: > Accumulators are generally unreliable and should not be used. The answer > to (2) and (4) is yes. The answer to (3) is both. > > Here's a more in-depth explanation: >

Re: Where is yarn-shuffle.jar in maven?

2016-12-13 Thread dhruve ashar

Hi Neal, >From my understanding, the reason I think the shuffle is not available in maven central is that it is an external component and dependent on the cluster manager version - atleast in case of yarn. For example, it would require the appropriate hadoop profile based on your underlying

Re: Where is yarn-shuffle.jar in maven?

2016-12-13 Thread Marcelo Vanzin

https://mvnrepository.com/artifact/org.apache.spark/spark-network-yarn_2.11/2.0.2 On Mon, Dec 12, 2016 at 9:56 PM, Neal Yin wrote: > Hi, > > For dynamic allocation feature, I need spark-xxx-yarn-shuffle.jar. In my > local spark build, I can see it. But in maven central, I

Streaming Batch Oddities

2016-12-13 Thread Bryan Jeffrey

Hello. I have a current Spark 1.6.1 application that I am working to modify. The flow of the application looks something like the following: (Kafka) --> (Direct Stream Receiver) --> (Repartition) --> (Extract/Schemitization Logic w/ RangePartitioner) --> Several Output Operations In the

Output Side Effects for different chain of operations

2016-12-13 Thread Chawla,Sumit

Hi All I have a workflow with different steps in my program. Lets say these are steps A, B, C, D. Step B produces some temp files on each executor node. How can i add another step E which consumes these files? I understand the easiest choice is to copy all these temp files to any shared

About Spark Multiple Shared Context with Spark 2.0

2016-12-13 Thread Chetan Khatri

Hello Guys, What would be approach to accomplish Spark Multiple Shared Context without Alluxio and with with Alluxio , and what would be best practice to achieve parallelism and concurrency for spark jobs. Thanks. -- Yours Aye, Chetan Khatri. M.+91 7 80574 Data Science Researcher INDIA

How to set classpath for a job that submit to Mesos cluster

2016-12-13 Thread Chanh Le

Hi everyone, I have a job that read segment data from druid then convert to csv. When I run it in local mode it works fine. /home/airflow/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --driver-memory 1g --master "local[4]" --files /home/airflow/spark-jobs/forecast_jobs/prod.conf --conf

Re: Graphx triplet comparison

Re: Streaming Batch Oddities

Belief propagation algorithm is open sourced

Parallel read from OracleDB slow, fails on large tables

Re: Third party library

Re: [Spark-SQL] collect_list() support for nested collection

Re: [Spark-SQL] collect_list() support for nested collection

Re: Third party library

Re: Graphx triplet comparison

Re: About Spark Multiple Shared Context with Spark 2.0

Re: Third party library

Fwd: [Spark-SQL] collect_list() support for nested collection

Re: Few questions on reliability of accumulators value.

Re: Where is yarn-shuffle.jar in maven?

Re: Where is yarn-shuffle.jar in maven?

Streaming Batch Oddities

Output Side Effects for different chain of operations

About Spark Multiple Shared Context with Spark 2.0

How to set classpath for a job that submit to Mesos cluster

19 matches

Site Navigation

Mail list logo

Footer information