Re: Save DataFrame to Hive Table

2016-02-29 Thread Jeff Zhang
The following line does not execute the sql so the table is not created. Add .show() at the end to execute the sql. hiveContext.sql("CREATE TABLE IF NOT EXISTS TableName (key INT, value STRING)") On Tue, Mar 1, 2016 at 2:22 PM, Yogesh Vyas wrote: > Hi, > > I have created

Save DataFrame to Hive Table

2016-02-29 Thread Yogesh Vyas
Hi, I have created a DataFrame in Spark, now I want to save it directly into the hive table. How to do it.? I have created the hive table using following hiveContext: HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc.sc()); hiveContext.sql("CREATE TABLE IF NOT

Re: [ERROR]: Spark 1.5.2 + Hbase 1.1 + Hive 1.2 + HbaseIntegration

2016-02-29 Thread Ted Yu
16/02/29 23:09:34 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=9 watcher=hconnection-0x26fa89a20x0, quorum=localhost:2181, baseZNode=/hbase Since baseZNode didn't match what you set in hbase-site.xml, the cause was likely that hbase-site.xml being

Update edge weight in graphx

2016-02-29 Thread naveen.marri
Hi, I'm trying to implement an algorithm using graphx which involves updating edge weight during every iteration. the format is [Node]-[Node]--[Weight] Ex: I checked in docs of graphx but didn't find any resources to change the weight of the edge for a same RDD I know RDDs

Support virtualenv in PySpark

2016-02-29 Thread Jeff Zhang
I have created jira for this feature , comments and feedback are welcome about how to improve it and whether it's valuable for users. https://issues.apache.org/jira/browse/SPARK-13587 Here's some background info and status of this work. Currently, it's not easy for user to add third party

RE: Use maxmind geoip lib to process ip on Spark/Spark Streaming

2016-02-29 Thread Silvio Fiorito
I’ve used the code below with SparkSQL. I was using this with Spark 1.4 but should still be good with 1.6. In this case I have a UDF to do the lookup, but for Streaming you’d just have a lambda to apply in a map function, so no UDF wrapper. import org.apache.spark.sql.functions._ import

Re: perl Kafka::Producer, “Kafka::Exception::Producer”, “code”, -1000, “message”, "Invalid argument

2016-02-29 Thread Vinti Maheshwari
Hi Cody, Sorry, i realized afterwards, i should not ask here. My actual program is spark-streaming and i used kafka for input streaming. Thanks, Vinti On Mon, Feb 29, 2016 at 1:46 PM, Cody Koeninger wrote: > Does this issue involve Spark at all? Otherwise you may have

Re: [ERROR]: Spark 1.5.2 + Hbase 1.1 + Hive 1.2 + HbaseIntegration

2016-02-29 Thread Ted Yu
16/02/29 23:09:34 INFO ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) Is your cluster secure cluster ? bq. Trace : Was there any output after 'Trace :' ? Was hbase-site.xml accessible to your Spark job

[ERROR]: Spark 1.5.2 + Hbase 1.1 + Hive 1.2 + HbaseIntegration

2016-02-29 Thread Divya Gehlot
Hi, I am getting error when I am trying to connect hive table (which is being created through HbaseIntegration) in spark Steps I followed : *Hive Table creation code *: CREATE EXTERNAL TABLE IF NOT EXISTS TEST(NAME STRING,AGE INT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH

RE: Spark UI standalone "crashes" after an application finishes

2016-02-29 Thread Mohammed Guller
I believe the OP is referring to the application UI on port 4040. The application UI on port 4040 is available only while application is running. As per the documentation: To view the web UI after the fact, set spark.eventLog.enabled to true before starting the application. This configures

?????? Spark UI standalone "crashes" after an application finishes

2016-02-29 Thread Sea
Hi, Sumona: It's a bug in Spark old version, In spark 1.6.0, it is fixed. After the application complete, spark master will load event log to memory, and it is sync because of actor. If the event log is big, spark master will hang a long time, and you can not submit any applications,

Fwd: Mapper side join with DataFrames API

2016-02-29 Thread Deepak Gopalakrishnan
Hello All, I'm trying to join 2 dataframes A and B with a sqlContext.sql("SELECT * FROM A INNER JOIN B ON A.a=B.a"); Now what I have done is that I have registeredTempTables for A and B after loading these DataFrames from different sources. I need the join to be really fast and I was wondering

Error when trying to insert data to a Parquet data source in HiveQL

2016-02-29 Thread SRK
Hi, I seem to be getting the following error when I try to insert data to a parquet datasource. Any idea as to why this is happening? org.apache.hadoop.hive.ql.metadata.HiveException: parquet.hadoop.MemoryManager$1: New Memory allocation 1045004 bytes is smaller than the minimum allocation size

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-29 Thread Shixiong(Ryan) Zhu
Could you post the screenshot of the Streaming DAG and also the driver log? It would be great if you have a simple producer for us to debug. On Mon, Feb 29, 2016 at 1:39 AM, Abhishek Anand wrote: > Hi Ryan, > > Its not working even after removing the reduceByKey. > >

DataSet Evidence

2016-02-29 Thread Steve Lewis
I have a relatively complex Java object that I would like to use in a dataset if I say Encoder evidence = Encoders.kryo(MyType.class); JavaRDD rddMyType= generateRDD(); // some code Dataset datasetMyType= sqlCtx.createDataset( rddMyType.rdd(), evidence); I get one column - the whole object

Re: Spark UI standalone "crashes" after an application finishes

2016-02-29 Thread Shixiong(Ryan) Zhu
Do you mean you cannot access Master UI after your application completes? Could you check the master log? On Mon, Feb 29, 2016 at 3:48 PM, Sumona Routh wrote: > Hi there, > I've been doing some performance tuning of our Spark application, which is > using Spark 1.2.1

Spark UI standalone "crashes" after an application finishes

2016-02-29 Thread Sumona Routh
Hi there, I've been doing some performance tuning of our Spark application, which is using Spark 1.2.1 standalone. I have been using the spark metrics to graph out details as I run the jobs, as well as the UI to review the tasks and stages. I notice that after my application completes, or is near

Re: Spark for client

2016-02-29 Thread Mich Talebzadeh
Thank you very much both Zeppelin looks promising. Basically as I understand runs an agent on a given port (I chose 21999) on the host that Spark is installed. I created a notebook and running scripts through there. One thing for sure notebook just returns the results rather all other stuff that

RE: a basic question on first use of PySpark shell and example, which is failing

2016-02-29 Thread Taylor, Ronald C
I guess I should also point out that I do an export CLASSPATH in my .bash_profile file, so the CLASSPATH info should be usable by the PySpark shell that I invoke. Ron Ronald C. Taylor, Ph.D. Computational Biology & Bioinformatics Group Pacific Northwest National Laboratory (U.S. Dept of

RE: a basic question on first use of PySpark shell and example, which is failing

2016-02-29 Thread Taylor, Ronald C
HI Yin, My Classpath is set to: CLASSPATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/*:/people/rtaylor/SparkWork/DataAlgUtils:. And there is indeed a spark-core.jar in the ../jars subdirectory, though it is not named precisely “spark-core.jar”. It has a version number in its name,

Re: a basic question on first use of PySpark shell and example, which is failing

2016-02-29 Thread Yin Yang
RDDOperationScope is in spark-core_2.1x jar file. 7148 Mon Feb 29 09:21:32 PST 2016 org/apache/spark/rdd/RDDOperationScope.class Can you check whether the spark-core jar is in classpath ? FYI On Mon, Feb 29, 2016 at 1:40 PM, Taylor, Ronald C wrote: > Hi Jules,

Re: Spark on Windows platform

2016-02-29 Thread Steve Loughran
On 29 Feb 2016, at 13:40, gaurav pathak > wrote: Thanks Jorn. Any guidance on how to get started with getting SPARK on Windows, is highly appreciated. Thanks & Regards Gaurav Pathak you are at risk of seeing stack traces when

Re: perl Kafka::Producer, “Kafka::Exception::Producer”, “code”, -1000, “message”, "Invalid argument

2016-02-29 Thread Cody Koeninger
Does this issue involve Spark at all? Otherwise you may have better luck on a perl or kafka related list. On Mon, Feb 29, 2016 at 3:26 PM, Vinti Maheshwari wrote: > Hi All, > > I wrote kafka producer using kafka perl api, But i am getting error when i > am passing

Re: Spark Integration Patterns

2016-02-29 Thread Alexander Pivovarov
There is a spark-jobserver (SJS) which is REST interface for spark and spark-sql you can deploy your jar file with Jobs impl to spark-jobserver and use rest API to submit jobs in synch or async mode in sync mode you need to poll SJS to get job result job result might be actual data in json or path

Re: Flattening Data within DataFrames

2016-02-29 Thread Kevin Mellott
Thanks Michal - this is exactly what I need. On Mon, Feb 29, 2016 at 11:40 AM, Michał Zieliński < zielinski.mich...@gmail.com> wrote: > Hi Kevin, > > This should help: > > https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-spark.html > > On 29 February 2016 at 16:54, Kevin

RE: Spark Integration Patterns

2016-02-29 Thread skaarthik oss
Check out http://toree.incubator.apache.org/. It might help with your need. From: moshir mikael [mailto:moshir.mik...@gmail.com] Sent: Monday, February 29, 2016 5:58 AM To: Alex Dzhagriev Cc: user Subject: Re: Spark Integration Patterns Thanks,

Re: Spark for client

2016-02-29 Thread Minudika Malshan
+Adding resources https://zeppelin.incubator.apache.org/docs/latest/interpreter/spark.html https://zeppelin.incubator.apache.org Minudika Malshan Undergraduate Department of Computer Science and Engineering University of Moratuwa. *Mobile : +94715659887* *LinkedIn* :

Re: Spark for client

2016-02-29 Thread Minudika Malshan
Hi, I think zeppelin spark interpreter will give a solution to your problem. Regards. Minudika Minudika Malshan Undergraduate Department of Computer Science and Engineering University of Moratuwa. *Mobile : +94715659887* *LinkedIn* : https://lk.linkedin.com/in/minudika On Tue, Mar 1, 2016 at

Re: Spark for client

2016-02-29 Thread Sabarish Sasidharan
Zeppelin? Regards Sab On 01-Mar-2016 12:27 am, "Mich Talebzadeh" wrote: > Hi, > > Is there such thing as Spark for client much like RDBMS client that have > cut down version of their big brother useful for client connectivity but > cannot be used as server. > > Thanks

Spark for client

2016-02-29 Thread Mich Talebzadeh
Hi, Is there such thing as Spark for client much like RDBMS client that have cut down version of their big brother useful for client connectivity but cannot be used as server. Thanks Dr Mich Talebzadeh LinkedIn *

Re: LDA topic Modeling spark + python

2016-02-29 Thread Bryan Cutler
The input into LDA.train needs to be an RDD of a list with the first element an integer (id) and the second a pyspark.mllib.Vector object containing real numbers (term counts), i.e. an RDD of [doc_id, vector_of_counts]. >From your example, it looks like your corpus is a list with an zero-based

Re: Unresolved dep when building project with spark 1.6

2016-02-29 Thread Josh Rosen
Have you tried removing the leveldbjni files from your local ivy cache? My hunch is that this is a problem with some local cache state rather than the dependency simply being unavailable / not existing (note that the error message was "origin location must be absolute:[...]", not that the files

Re: Spark 1.5 on Mesos

2016-02-29 Thread Ashish Soni
What is the Best practice , I have everything running as docker container in single host ( mesos and marathon also as docker container ) and everything comes up fine but when i try to launch the spark shell i get below error SQL context available as sqlContext. scala> val data =

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Koert Kuipers
setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks is there any reference to the benefits of setting reduceLocality to true? i am tempted to disable it across the board. On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang wrote: > The default value for

Re: Flattening Data within DataFrames

2016-02-29 Thread Michał Zieliński
Hi Kevin, This should help: https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-spark.html On 29 February 2016 at 16:54, Kevin Mellott wrote: > Fellow Sparkers, > > I'm trying to "flatten" my view of data within a DataFrame, and am having >

Re: Spark 1.5 on Mesos

2016-02-29 Thread Tim Chen
No you don't have to run Mesos in docker containers to run Spark in docker containers. Once you have Mesos cluster running you can then specfiy the Spark configurations in your Spark job (i.e: spark.mesos.executor.docker.image=mesosphere/spark:1.6) and Mesos will automatically launch docker

Optimizing cartesian product using keys

2016-02-29 Thread eahlberg
Hello, To avoid computing all possible combinations, I'm trying to group values according to a certain key, and then compute the cartesian product of the values for each key, i.e.: Input [(k1, [v1]), (k1, [v2]), (k2, [v3])] Desired output: [(v1, v1), (v1, v2), (v2, v2), (v2, v1), (v3, v3)]

Flattening Data within DataFrames

2016-02-29 Thread Kevin Mellott
Fellow Sparkers, I'm trying to "flatten" my view of data within a DataFrame, and am having difficulties doing so. The DataFrame contains product information, which includes multiple levels of categories (primary, secondary, etc). *Example Data (Raw):* *NameLevel

Re: [MLlib] How to set Loss to Gradient Boosted Tree in Java

2016-02-29 Thread diplomatic Guru
Thank you very much Kevin. On 29 February 2016 at 16:20, Kevin Mellott wrote: > I found a helper class that I think should do the trick. Take a look at > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Losses.scala >

Re: [MLlib] How to set Loss to Gradient Boosted Tree in Java

2016-02-29 Thread Kevin Mellott
I found a helper class that I think should do the trick. Take a look at https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Losses.scala When passing the Loss, you should be able to do something like: Losses.fromString("leastSquaresError") On Mon,

Re: [MLlib] How to set Loss to Gradient Boosted Tree in Java

2016-02-29 Thread diplomatic Guru
It's strange as you are correct the doc does state it. But it's complaining about the constructor. When I clicked on the org.apache.spark.mllib.tree.loss.AbsoluteError class, this is what I see: @Since("1.2.0") @DeveloperApi object AbsoluteError extends Loss { /** * Method to calculate

Re: [MLlib] How to set Loss to Gradient Boosted Tree in Java

2016-02-29 Thread Kevin Mellott
Looks like it should be present in 1.3 at org.apache.spark.mllib.tree.loss.AbsoluteError spark.apache.org/docs/1.3.0/api/java/org/apache/spark/mllib/tree/loss/AbsoluteError.html On Mon, Feb 29, 2016 at 9:46 AM, diplomatic Guru wrote: > AbsoluteError() constructor is

Re: [MLlib] How to set Loss to Gradient Boosted Tree in Java

2016-02-29 Thread diplomatic Guru
AbsoluteError() constructor is undefined. I'm using Spark 1.3.0, maybe it is not ready for this version? On 29 February 2016 at 15:38, Kevin Mellott wrote: > I believe that you can instantiate an instance of the AbsoluteError class > for the *Loss* object, since

Re: [MLlib] How to set Loss to Gradient Boosted Tree in Java

2016-02-29 Thread Kevin Mellott
I believe that you can instantiate an instance of the AbsoluteError class for the *Loss* object, since that object implements the Loss interface. For example. val loss = new AbsoluteError() boostingStrategy.setLoss(loss) On Mon, Feb 29, 2016 at 9:33 AM, diplomatic Guru

Re: Spark 1.5 on Mesos

2016-02-29 Thread Ashish Soni
Yes i read that and not much details here. Is it true that we need to have spark installed on each mesos docker container ( master and slave ) ... Ashish On Fri, Feb 26, 2016 at 2:14 PM, Tim Chen wrote: > https://spark.apache.org/docs/latest/running-on-mesos.html should be

Re: kafka + mysql filtering problem

2016-02-29 Thread Cody Koeninger
You're getting confused about what code is running on the driver vs what code is running on the executor. Read http://spark.apache.org/docs/latest/programming-guide.html#understanding-closures-a-nameclosureslinka On Mon, Feb 29, 2016 at 8:00 AM, franco barrientos <

Re: [MLlib] How to set Loss to Gradient Boosted Tree in Java

2016-02-29 Thread Kevin Mellott
You can use the constructor that accepts a BoostingStrategy object, which will allow you to set the tree strategy (and other hyperparameters as well). *GradientBoostedTrees

[MLlib] How to set Loss to Gradient Boosted Tree in Java

2016-02-29 Thread diplomatic Guru
Hello guys, I think the default Loss algorithm is Squared Error for regression, but how do I change that to Absolute Error in Java. Could you please show me an example?

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Yin Yang
The default value for spark.shuffle.reduceLocality.enabled is true. To reduce surprise to users of 1.5 and earlier releases, should the default value be set to false ? On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga wrote: > Hi Koret, > Try

Re: Spark on Windows platform

2016-02-29 Thread Gaurav Agarwal
> Hi > I am running spark on windows but a standalone one. > > Use this code > > SparkConf conf = new SparkConf().setMaster("local[1]").seatAppName("spark").setSparkHome("c:/spark/bin/spark-submit.cmd"); > > Where sparkhome is the path where u extracted ur spark binaries till bin/*.cmd > > You

kafka + mysql filtering problem

2016-02-29 Thread franco barrientos
Hi all, I want to read some filtering rules from mysql (jdbc mysql driver) specifically its a char type containing a field and value to process in a kafka streaming input. The main idea is to process this from a web UI (livy server). Any suggestion or guidelines? e.g., I have this: object

Re: Spark Integration Patterns

2016-02-29 Thread moshir mikael
Thanks, will check too, however : just want to use Spark core RDD and standard data sources. Le lun. 29 févr. 2016 à 14:54, Alex Dzhagriev a écrit : > Hi Moshir, > > Regarding the streaming, you can take a look at the spark streaming, the > micro-batching framework. If it

Re: Spark Integration Patterns

2016-02-29 Thread Alex Dzhagriev
Hi Moshir, Regarding the streaming, you can take a look at the spark streaming, the micro-batching framework. If it satisfies your needs it has a bunch of integrations. Thus, the source for the jobs could be Kafka, Flume or Akka. Cheers, Alex. On Mon, Feb 29, 2016 at 2:48 PM, moshir mikael

Re: Spark Integration Patterns

2016-02-29 Thread moshir mikael
Hi Alex, thanks for the link. Will check it. Does someone know of a more streamlined approach ? Le lun. 29 févr. 2016 à 10:28, Alex Dzhagriev a écrit : > Hi Moshir, > > I think you can use the rest api provided with Spark: >

Re: Spark on Windows platform

2016-02-29 Thread gaurav pathak
Thanks Jorn. Any guidance on how to get started with getting SPARK on Windows, is highly appreciated. Thanks & Regards Gaurav Pathak ~ sent from handheld device On Feb 29, 2016 5:34 AM, "Jörn Franke" wrote: > I think Hortonworks has a Windows Spark distribution. Maybe

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Lior Chaga
Hi Koret, Try spark.shuffle.reduceLocality.enabled=false This is an undocumented configuration. See: https://github.com/apache/spark/pull/8280 https://issues.apache.org/jira/browse/SPARK-10567 It solved the problem for me (both with and without memory legacy mode) On Sun, Feb 28, 2016 at 11:16

Re: Spark on Windows platform

2016-02-29 Thread Jörn Franke
I think Hortonworks has a Windows Spark distribution. Maybe Bigtop as well? > On 29 Feb 2016, at 14:27, gaurav pathak wrote: > > Can someone guide me the steps and information regarding, installation of > SPARK on Windows 7/8.1/10 , as well as on Windows Server.

Re: [Error]: Spark 1.5.2 + HiveHbase Integration

2016-02-29 Thread Ted Yu
Divya: Please try not to cross post your question. In your case HBase-common jar is needed. To find all the hbase jars needed, you can run 'mvn dependency:tree' and check its output. > On Feb 29, 2016, at 1:48 AM, Divya Gehlot wrote: > > Hi, > I am trying to access

Spark on Windows platform

2016-02-29 Thread gaurav pathak
Can someone guide me the steps and information regarding, installation of SPARK on Windows 7/8.1/10 , as well as on Windows Server. Also, it will be great to read your experiences in using SPARK on Windows platform. Thanks & Regards, Gaurav Pathak

Implementation of random algorithm walk in spark

2016-02-29 Thread naveen.marri
Hi, I'm new to spark, I'm trying to compute similarity between users/products. I've a huge table which I can't do a self join with the cluster I have. I'm trying to implement do self join using random walk methodology which will approximately give the results. The table is a bipartite graph with

Implementation of random algorithm walk in spark

2016-02-29 Thread naveenkumarmarri
Hi, I'm new to spark, I'm trying to compute similarity between users/products. I've a huge table which I can't do a self join with the cluster I have. I'm trying to implement do self join using random walk methodology which will approximately give the results. The table is a bipartite graph with

Deadlock between UnifiedMemoryManager and BlockManager

2016-02-29 Thread Sea
Hi??all?? My spark version is 1.6.0, I found a deadlock in production environment, Anyone can help? I create an issue in jira: https://issues.apache.org/jira/browse/SPARK-13566 === "block-manager-slave-async-thread-pool-1": at

spark lda runs out of disk space

2016-02-29 Thread TheGeorge1918 .
Hi guys I was running lda with 2000 topics on 6G compressed data, roughly 1.2 million docs. I used aws 3 r3.8xlarge machines as core nodes. It turned out spark applications crushed after 3 or 4 iterations. From ganglia, it indicated the disk space was all consumed. I believe it’s the shuffle

What is the best approach to perform concurrent updates from different jobs to a in memory dataframe registered as a temp table?

2016-02-29 Thread Roger Marin
Hi all, I have multiple (>100) jobs running concurrently (sharing the same hive context) that are each appending new rows to the same dataframe registered as a temp table. Currently I am using unionAll and registering that dataframe again as a temp table in each job: Given an existing dataframe

[Help]: Steps to access hive table + Spark 1.5.2 + HbaseIntegration + Hive 1.2 + Hbase 1.1

2016-02-29 Thread Divya Gehlot
Hi, Can anybody help me by sharing the steps/examples How to connect to hive table(which is being created using HbaseIntegration ) through hivecontext in Spark I googled but couldnt find a single example/document . Would really

Re: DirectFileOutputCommiter

2016-02-29 Thread Steve Loughran
> On 26 Feb 2016, at 06:24, Takeshi Yamamuro wrote: > > Hi, > > Great work! > What is the concrete performance gain of the committer on s3? > I'd like to know. > > I think there is no direct committer for files because these kinds of > committer has risks > to loss

Unresolved dep when building project with spark 1.6

2016-02-29 Thread Hao Ren
Hi, I am upgrading my project to spark 1.6. It seems that the deps are broken. Deps used in sbt val scalaVersion = "2.10" val sparkVersion = "1.6.0" val hadoopVersion = "2.7.1" // Libraries val scalaTest = "org.scalatest" %% "scalatest" % "2.2.4" % "test" val sparkSql = "org.apache.spark" %%

Re: Recommendation for a good book on Spark, beginner to moderate knowledge

2016-02-29 Thread charles li
since spark is under actively developing, so take a book to learn it is somehow outdated to some degree. I would like to suggest learn it from several ways as bellow: - spark official document, trust me, you will go through this for several time if you want to learn in well :

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-29 Thread Abhishek Anand
Hi Ryan, Its not working even after removing the reduceByKey. So, basically I am doing the following - reading from kafka - flatmap inside transform - mapWithState - rdd.count on output of mapWithState But to my surprise still dont see checkpointing taking place. Is there any restriction to

Re: java.io.IOException: java.lang.reflect.InvocationTargetException on new spark machines

2016-02-29 Thread Abhishek Anand
Hi Ryan, I was able to resolve this issue. The /tmp location was mounted with "noexec" option. Removing this noexec in the fstab resolved the issue. The snappy shared object file is created at the /tmp location so either removing the noexec from mount or changing the default temp location solved

Re: Spark Integration Patterns

2016-02-29 Thread Alex Dzhagriev
Hi Moshir, I think you can use the rest api provided with Spark: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala Unfortunately, I haven't find any documentation, but it looks fine. Thanks, Alex. On Sun, Feb 28, 2016 at 3:25

Re: Recommendation for a good book on Spark, beginner to moderate knowledge

2016-02-29 Thread Ashok Kumar
Thank you all for valuable advice. Much appreciated Best On Sunday, 28 February 2016, 21:48, Ashok Kumar wrote:   Hi Gurus, Appreciate if you recommend me a good book on Spark or documentation for beginner to moderate knowledge I very much like to skill myself on

Re: Spark Integration Patterns

2016-02-29 Thread moshir mikael
Well, I have a personal project where I want to build a *spreadsheet *on top of spark. I have a version of my app running on postgresql, which does not scale, and would like to move data processing to spark. You can import data, explore data, analyze data, visualize data ... You don't need to be

Re: DirectFileOutputCommiter

2016-02-29 Thread Takeshi Yamamuro
Hi, I think the essential culprit is that these committers are not idempotent; retry attempts will fail. See codes below for details; https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L130 On Sat, Feb 27, 2016 at