Processing json document

2016-07-06 Thread Lan Jiang
Hi, there Spark has provided json document processing feature for a long time. In most examples I see, each line is a json object in the sample file. That is the easiest case. But how can we process a json document, which does not conform to this standard format (one line per json object)? Here

Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-06 Thread Rabin Banerjee
In yarn cluster mode , Driver is running in AM , so you can find the logs in that AM log . Open rersourcemanager UI , and check for the Job and logs. or yarn logs -applicationId In yarn client mode , the driver is the same JVM from where you are launching ,,So you are getting it in the log . On

Re: Question regarding structured data and partitions

2016-07-06 Thread Koert Kuipers
spark does keep some information on the partitions of an RDD, namely the partitioning/partitioner. GroupSorted is an extension for key-value RDDs that also keeps track of the ordering, allowing for faster joins, non-reduce type operations on very large groups of values per key, etc. see here:

Question regarding structured data and partitions

2016-07-06 Thread Omid Alipourfard
Hi, Why doesn't Spark keep information about the structure of the RDDs or the partitions within RDDs? Say that I use repartitionAndSortWithinPartitions, which results in sorted partitions. With sorted partitions, lookups should be super fast (binary search?), yet I still need to go through the

Structured Streaming Comparison to AMPS

2016-07-06 Thread craigjar
I have been doing several Spark PoC projects recently and the latest one involved the new 2.0 experimental feature Structured Streaming. My PoC ended up being a non-starter as I quickly realized the stream to stream joins are not implemented yet. I believe this feature will be immensely powerful

Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-06 Thread Yu Wei
Launching via client deploy mode, it works again. I'm still a little confused about the behavior difference for cluster and client mode on a single machine. Thanks, Jared From: Mich Talebzadeh Sent: Wednesday, July 6, 2016 9:46:11

Re: How to spin up Kafka using docker and use for Spark Streaming Integration tests

2016-07-06 Thread swetha kasireddy
Can this docker image be used to spin up kafka cluster in a CI/CD pipeline like Jenkins to run the integration tests? Or it can be done only in the local machine that has docker installed? I assume that the box where the CI/CD pipeline runs should have docker installed correct? On Mon, Jul 4,

Logs of spark driver in yarn-client mode.

2016-07-06 Thread Egor Pahomov
Hi, I have next issue: I have zeppelin, which set up in yarn-client mode. Notebook in Running state for long period of time with 0% done and I do not see any even accepted application in yarn. To be able to understand what's going on, I need logs of spark driver, which is trying to connect to

Re: How to spin up Kafka using docker and use for Spark Streaming Integration tests

2016-07-06 Thread swetha kasireddy
The application output is that it inserts data to cassandra at the end of every batch. On Mon, Jul 4, 2016 at 5:20 AM, Lars Albertsson wrote: > I created such a setup for a client a few months ago. It is pretty > straightforward, but it can take some work to get all the wires

Presentation in London: Running Spark on Hive or Hive on Spark

2016-07-06 Thread Mich Talebzadeh
Dear forum members I will be presenting on the topic of "Running Spark on Hive or Hive on Spark, your mileage varies" in Future of Data: London *Details* *Organized by: Hortonworks * *Date: Wednesday, July

Re: Difference between DataFrame.write.jdbc and DataFrame.write.format("jdbc")

2016-07-06 Thread Xiao Li
Hi, Dragisa, Just submitted a PR for implementing the save API. https://github.com/apache/spark/pull/14077 Let me know if you have any question, Xiao 2016-07-06 10:41 GMT-07:00 Rabin Banerjee : > HI Buddy, > >I sued both but DataFrame.write.jdbc is old, and

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Andreas Bauer
The job works just fine. DB2 also performs very well. But I'm supposed to investigate alternatives. Thanks for the advice regarding Apache Drill. I'll definitely have a look! Best regards, Andreas Sorry, I was assuming that you wanted to build the data lake in Hadoop rather than just

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Michael Segel
Sorry, I was assuming that you wanted to build the data lake in Hadoop rather than just reading from DB2. (Data Lakes need to be built correctly. ) So, slightly different answer. Yes, you can do this… You will end up with an immutable copy of the data that you would read in serially. Then

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Mich Talebzadeh
Well your mileage varies depending on what you want to do. I suggest that you do a POC to find out exactly what benefits you are going to get and if the approach is going to pay. Spark does not have a CBO like DB2 or Oracle but provides DAG and in-memory capabilities. Use something basis like

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Andreas Bauer
Thanks for the advice. I have to retrieve the basic data from the DB2 tables but afterwards I'm pretty free to transform the data as needed.  I think you need to learn the basics of how to build a ‘data lake/pond/sewer’ first. The short answer is yes. The longer answer is that you need

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Andreas Bauer
Yes, that was the idea to cache the tables in memory as they should neatly fit.  The loading time is no problem as the job is not time critical. The critical point is the constant access to the DB2 tables, which consumes costly MIPS, and this I hope to replace with the cached version.  So, I'll

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Michael Segel
I think you need to learn the basics of how to build a ‘data lake/pond/sewer’ first. The short answer is yes. The longer answer is that you need to think more about translating a relational model in to a hierarchical model, something that I seriously doubt has been taught in schools in a

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Mich Talebzadeh
Well you can try it. I have done it with Oracle, SAP Sybase IQ etc but need to be aware of time that JDBC connection is going to take to load data. Sounds like your tables are pretty small so they can be cached. Where are you going to store the result set etc? HTH Dr Mich Talebzadeh

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Jean Georges Perrin
Right now, I am having "fun" with Spark and 26446249960843350 datapoints on my MacBook Air, but my small friend is suffering... From my experience: You will be able to do the job with Spark. You can try to load everything on a dev machine, no need to have a server, a workstation might be

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Andreas Bauer
 In fact, yes.  On 6. Juli 2016 um 21:46:34 MESZ, Mich Talebzadeh wrote:So you want to use Spark as the query engine accessing DB2 tables via JDBC? Dr Mich Talebzadeh LinkedIn  

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Mich Talebzadeh
So you want to use Spark as the query engine accessing DB2 tables via JDBC? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Andreas Bauer
The sql statements are embedded in a PL/1 program using DB2 running ob z/OS. Quite powerful, but expensive and foremost shared withother jobs in the comapny. The whole job takes approx. 20 minutes.  So I was thinking to use Spark and let the Spark job run on 10 or 20 virtual instances, which I

Re: Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread Jean Georges Perrin
What are you doing it on right now? > On Jul 6, 2016, at 3:25 PM, dabuki wrote: > > I was thinking about to replace a legacy batch job with Spark, but I'm not > sure if Spark is suited for this use case. Before I start the proof of > concept, I wanted to ask for opinions. >

Is Spark suited for replacing a batch job using many database tables?

2016-07-06 Thread dabuki
I was thinking about to replace a legacy batch job with Spark, but I'm not sure if Spark is suited for this use case. Before I start the proof of concept, I wanted to ask for opinions. The legacy job works as follows: A file (100k - 1 mio entries) is iterated. Every row contains a (book) order

spark classloader question

2016-07-06 Thread Chen Song
Hi I ran into problems to use class loader in Spark. In my code (run within executor), I explicitly load classes using the ContextClassLoader as below. Thread.currentThread().getContextClassLoader() The jar containing the classes to be loaded is added via the --jars option in

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread Cody Koeninger
> Yes and I sent you results. It is appropriate only with known parameters of > input data stream. No, as far as I can tell from your posts in this thread and your linked project, you only tested with auto.offset.reset smallest and a large backlog. That's not what I advised you to do. Don't

Re: Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread Mich Talebzadeh
This will work in Hive Don't know why you are getting null values val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) HiveContext.sql("use test") val e = HiveContext.table("emp") val d = HiveContext.table("dept") val rs = e.join(d,e("deptid")===d("deptid"), "fullouter")

Re: Difference between DataFrame.write.jdbc and DataFrame.write.format("jdbc")

2016-07-06 Thread Rabin Banerjee
HI Buddy, I sued both but DataFrame.write.jdbc is old, and will work if provide table name , It wont work if you provide custom queries . Where as DataFrame.write.format is more generic as well as working perfectly with not only table name but also custom queries . Hence I recommend to use the

Maintain complete state for updateStateByKey

2016-07-06 Thread Sunita Arvind
Hello Experts, I have a requirement of maintaining a list of ids for every customer for all of time. I should be able to provide count distinct ids on demand. All the examples I have seen so far indicate I need to maintain counts directly. My concern is, I will not be able to identify cumulative

Re: Difference between DataFrame.write.jdbc and DataFrame.write.format("jdbc")

2016-07-06 Thread Dragisa Krsmanovic
Yes, I had the save() at the end. I truncated example to highlight the difference and forgot to put back the save() It would be great to have the same behavior (and same code used) for both jdbc() and format("jdbc"). Thank you. On Wed, Jul 6, 2016 at 10:21 AM, Xiao Li

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
> > If you aren't processing messages as fast as you receive them, you're > going to run out of kafka retention regardless of whether you're using > Spark or Flink. Again, physics. It's just a question of what > compromises you choose. Yes. I wrote about it. But in case of Flink you will have

Re: Difference between DataFrame.write.jdbc and DataFrame.write.format("jdbc")

2016-07-06 Thread Xiao Li
Hi, Dragisa, Your second way is incomplete, right? To get the error you showed, you need to put save() there. Yeah, we can implement the trait CreatableRelationProvider for JDBC. Then, you will not see that error. Will submit a PR for that. Thanks, Xiao 2016-07-06 10:05 GMT-07:00 Dragisa

Re: Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread Radha krishna
Hi Mich, Here I given just a sample data, I have some GB's of files in HDFS and performing left outer joins on those files, and the final result I am going to store in Vertica data base table. There is no duplicate columns in the target table but for the non matching rows columns I want to insert

Difference between DataFrame.write.jdbc and DataFrame.write.format("jdbc")

2016-07-06 Thread Dragisa Krsmanovic
I was expecting to get the same results with both: dataFrame.write.mode(SaveMode.Overwrite).jdbc(dbUrl, "my_table", props) and dataFrame.write.mode(SaveMode.Overwrite).format("jdbc").options(opts).option("dbtable", "my_table") In the first example, it behaves as expected. It creates a new

Re: Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread Mich Talebzadeh
what do you mean database table here? you have repeating column names for the table namely deptid 0: jdbc:hive2://rhes564:10010/default> SELECT * FROM emp e LEFT OUTER JOIN dept d ON e.deptid = d.deptid; INFO : OK +---+-+---+---+--+--+ | e.emp_id

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread Cody Koeninger
If you aren't processing messages as fast as you receive them, you're going to run out of kafka retention regardless of whether you're using Spark or Flink. Again, physics. It's just a question of what compromises you choose. If by "growing of a processing window time of Spark" you mean a

Re: Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread radha
Hi , Thanks all, its working fine the issue is with some space for the dept id, I have one more doubt for the non matching records its showing null word, even if i write into HDFS also its showing null word how can we avoid writing null for the non matching columns, i want just empty value ("")

Re: Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread Radha krishna
Hi , Thanks all, its working fine the issue is with some space for the dept id, I have one more doubt for the non matching records its showing null word, even if i write into HDFS also its showing null word how can we avoid writing null for the non matching columns, i want just empty value ("")

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
Ok, thanks. But really this is not full decision. In case of growing of processing time I will have growing of window time. That is really with Spark I have 2 points of a latency growing. First is a delay of processing of messages in Kafka queue due to physical limitation of a computer system. And

Re: Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread Mich Talebzadeh
Hive query is hive> SELECT * FROM emp e LEFT OUTER JOIN dept d ON e.deptid = d.deptid; Status: Finished successfully in 2.02 seconds OK 1001aba 10 10 DEV 1002abs 20 20 TEST 1003abd 10 10 DEV 1001aba 10 10 DEV 1002abs

Re: Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread Mich Talebzadeh
This is very simple in Hive Status: Running (Hive on Spark job[1]) Job Progress Format CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] 2016-07-06 17:17:16,006 Stage-1_0: 0(+1)/1 2016-07-06 17:17:17,011 Stage-1_0: 1/1

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread Cody Koeninger
Then double the upper limit you have set until the processing time approaches the batch time. On Wed, Jul 6, 2016 at 11:06 AM, rss rss wrote: > Ok, with: > > .set("spark.streaming.backpressure.enabled","true") > .set("spark.streaming.receiver.maxRate", "1") >

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
Ok, with: .set("spark.streaming.backpressure.enabled","true") .set("spark.streaming.receiver.maxRate", "1") .set("spark.streaming.kafka.maxRatePerPartition", "1") I have something like *** Processing time: 5626

Re: SnappyData and Structured Streaming

2016-07-06 Thread Benjamin Kim
Jags, Thanks for the details. This makes things much clearer. I saw in the Spark roadmap that version 2.1 will add the SQL capabilities mentioned here. It looks like, gradually, the Spark community is coming to the same conclusions that the SnappyData folks have come to a while back in terms

Re: SnappyData and Structured Streaming

2016-07-06 Thread Jags Ramnarayan
The plan is to fully integrate with the new structured streaming API and implementation in an upcoming release. But, we will continue offering several extensions. Few noted below ... - the store (streaming sink) will offer a lot more capabilities like transactions, replicated tables, partitioned

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread Cody Koeninger
The configuration you set is spark.streaming.receiver.maxRate. The direct stream is not a receiver. As I said in my first message in this thread, and as the pages at http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers and

Re: Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread Rabin Banerjee
Checked in spark-shell with spark 1.5.0 scala> val emmpdat = sc.textFile("empfile"); emmpdat: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[23] at textFile at :21 scala> case class EMP (id:Int , name : String , deptId: Int) defined class EMP scala> val empdf = emmpdat.map((f) => {val

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
ok, thanks. I tried to set minimum max rate for beginning. However in general I don't know initial throughput. BTW it would be very useful to explain it in https://spark.apache.org/docs/latest/streaming-programming-guide.html#performance-tuning And really with

Re: how to select first 50 value of each group after group by?

2016-07-06 Thread Anton Okolnychyi
The following resources should be useful: https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-windows.html The last link should have the exact solution 2016-07-06 16:55 GMT+02:00 Tal

Re: how to select first 50 value of each group after group by?

2016-07-06 Thread Tal Grynbaum
You can use rank window function to rank each row in the group, and then filter the rowz with rank < 50 On Wed, Jul 6, 2016, 14:07 wrote: > hi there > I have a DF with 3 columns: id , pv, location.(the rows are already > grouped by location and sort by pv in des) I wanna

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread Cody Koeninger
The direct stream determines batch sizes on the driver, in advance of processing. If you haven't specified a maximum batch size, how would you suggest the backpressure code determine how to limit the first batch? It has no data on throughput until at least one batch is completed. Again, this is

Re: SnappyData and Structured Streaming

2016-07-06 Thread Benjamin Kim
Jags, I should have been more specific. I am referring to what I read at http://snappydatainc.github.io/snappydata/streamingWithSQL/, especially the Streaming Tables part. It roughly coincides with the Streaming DataFrames outlined here

SparkR | Exception in invokeJava: SparkR + Windows standalone cluster

2016-07-06 Thread AC24
I am trying to setup a Spark stand-alone cluster and run SparkR. I have one master and 2 slaves setup on a Windows Server 2012. I have tried running the sparkR + RStudio example as shown in this blog:[

Re: It seemed JavaDStream.print() did not work when launching via yarn on a single node

2016-07-06 Thread Yu Wei
Actually Time was printed out always. Is there any better method to debug the problem? I want to update spark/mqtt code and rebuild again to debug further. Thanks, Jared From: Saisai Shao Sent: Wednesday, July 6, 2016 9:24 PM To: Yu

Re: SnappyData and Structured Streaming

2016-07-06 Thread Jags Ramnarayan
Ben, Note that Snappydata's primary objective is to be a distributed in-memory DB for mixed workloads (i.e. streaming with transactions and analytic queries). On the other hand, Spark, till date, is primarily designed as a processing engine over myriad storage engines (SnappyData being one).

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
Hello, thanks, I tried to .set("spark.streaming.backpressure.enabled","true") but result is negative. Therefore I have prepared small test https://github.com/rssdev10/spark-kafka-streaming How to run: * git clone https://github.com/rssdev10/spark-kafka-streaming.git

Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-06 Thread Mich Talebzadeh
Deploy-mode cluster don't think will work. Try --master yarn --deploy-mode client FYI - *Spark Local* - Spark runs on the local host. This is the simplest set up and best suited for learners who want to understand different concepts of Spark and those performing unit testing. -

Re: Get both feature importance and ROC curve from a random forest classifier

2016-07-06 Thread Mathieu D
well, sounds trivial now ... ! thanks ;-) 2016-07-02 10:04 GMT+02:00 Yanbo Liang : > Hi Mathieu, > > Using the new ml package to train a RandomForestClassificationModel, you > can get feature importance. Then you can convert the prediction result to > RDD and feed it into

Re: It seemed JavaDStream.print() did not work when launching via yarn on a single node

2016-07-06 Thread Saisai Shao
DStream.print() will collect some of the data to driver and display, please see the implementation of DStream.print() RDD.take() will collect some of the data to driver. Normally the behavior should be consistent between cluster and local mode, please find out the root cause of this problem,

spark 2.0 bloom filters

2016-07-06 Thread matd
A question for Spark developers I see that Bloom filters have been integrated in Spark 2.0 . Hadoop already has some Bloom filter implementations, especially a dynamic one

Re: It seemed JavaDStream.print() did not work when launching via yarn on a single node

2016-07-06 Thread Yu Wei
How about DStream.print(). Does it invoke collect before print on driver? From: Sean Owen Sent: Wednesday, July 6, 2016 8:20:36 PM To: Rabin Banerjee Cc: Yu Wei; user@spark.apache.org Subject: Re: It seemed JavaDStream.print() did not work

Re: Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread ayan guha
looks like a data issue to me. Either EMP or DEPT has spaces in dept id for deptid=20,30. Did you check in hive cli? On Wed, Jul 6, 2016 at 10:33 PM, radha wrote: > Hi All, > > Please check below for the code and input and output, i think the output is > not correct, i am

Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread radha
Hi All, Please check below for the code and input and output, i think the output is not correct, i am missing any thing? pls guide Code public class Test { private static JavaSparkContext jsc = null; private static SQLContext sqlContext = null; private static

Spark Left outer Join issue using programmatic sql joins

2016-07-06 Thread Radha krishna
Hi All, Please check below for the code and input and output, i think the output is not correct, i am missing any thing? pls guide Code public class Test { private static JavaSparkContext jsc = null; private static SQLContext sqlContext = null; private static Configuration hadoopConf = null;

Re: It seemed JavaDStream.print() did not work when launching via yarn on a single node

2016-07-06 Thread Sean Owen
dstream.foreachRDD(_.collect.foreach(println)) On Wed, Jul 6, 2016 at 1:19 PM, Rabin Banerjee wrote: > Collect will help then . May be something like this, > foreachRDD( rdd => { for(item <- rdd.collect().toArray) { println(item); } > }) >

Re: It seemed JavaDStream.print() did not work when launching via yarn on a single node

2016-07-06 Thread Rabin Banerjee
Collect will help then . May be something like this, foreachRDD( rdd => { for(item <- rdd.collect().toArray) { println(item); } }) On Wed, Jul 6, 2016 at 5:46 PM, Sean Owen wrote: > That's still causing the element to be printed on the remote > executors, not the driver.

Re: It seemed JavaDStream.print() did not work when launching via yarn on a single node

2016-07-06 Thread Sean Owen
That's still causing the element to be printed on the remote executors, not the driver. You'd have to collect the RDD and then println, really. Also see DStream.print() On Wed, Jul 6, 2016 at 1:07 PM, Rabin Banerjee wrote: > It's not working because , you haven't

Re: It seemed JavaDStream.print() did not work when launching via yarn on a single node

2016-07-06 Thread Rabin Banerjee
It's not working because , you haven't collected the data. Try something like DStream.forEachRDD((rdd)=> {rdd.foreach(println)}) Thanks, Rabin On Wed, Jul 6, 2016 at 5:05 PM, Yu Wei wrote: > Hi guys, > > > It seemed that when launching application via yarn on single

It seemed JavaDStream.print() did not work when launching via yarn on a single node

2016-07-06 Thread Yu Wei
Hi guys, It seemed that when launching application via yarn on single node, JavaDStream.print() did not work. However, occasionally it worked. If launch the same application in local mode, it always worked. The code is as below, SparkConf conf = new SparkConf().setAppName("Monitor");

Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-06 Thread Yu Wei
Hi Deng, I tried the same code again. It seemed that when launching application via yarn on single node, JavaDStream.print() did not work. However, occasionally it worked. If launch the same application in local mode, it always worked. The code is as below, SparkConf conf = new

how to select first 50 value of each group after group by?

2016-07-06 Thread luohui20001
hi thereI have a DF with 3 columns: id , pv, location.(the rows are already grouped by location and sort by pv in des) I wanna get the first 50 id values grouped by location. I checked the API of dataframe,groupeddata,pairRDD, and found no match. is there a way to do this naturally?

streaming new data into bigger parquet file

2016-07-06 Thread Igor Berman
Hi I was reading following tutorial https://docs.cloud.databricks.com/docs/latest/databricks_guide/07%20Spark%20Streaming/08%20Write%20Output%20To%20S3.html of streaming data to s3 of databricks_guide and it states that sometimes I need to do compaction of small files(e.g. from spark streaming)

Re: Spark Task failure with File segment length as negative

2016-07-06 Thread Priya Ch
Is anyone resolved this ? Thanks, Padma CH On Wed, Jun 22, 2016 at 4:39 PM, Priya Ch wrote: > Hi All, > > I am running Spark Application with 1.8TB of data (which is stored in Hive > tables format). I am reading the data using HiveContect and processing it. >

where is open source Distributed service framework use for spark??

2016-07-06 Thread ????????
i want to my server is Distributed and used for spark,where are good service framework,must open source. thanks