Creating a Spark 3 Connector

2022-11-23 Thread Mitch Shepherd
Hello, I’m wondering if anyone can point me in the right direction for a Spark connector developer guide. I’m looking for information on writing a new connector for Spark to move data between Apache Spark and other systems. Any information would be helpful. I found a similar thing for

Dump table into file

2015-11-02 Thread Shepherd
Hi all, I have one table called "result" in the database, for example: /user/hive/warehouse/data_result.db/result How do I export the table "result" into a local csv file? Thanks a lot. -- View this message in context:

Get statistic result from RDD

2015-10-20 Thread Shepherd
Hi all, I am really newie in Spark and Scala. I cannot get the statistic result from a RDD. Is someone could help me on this? Current code is as follows: /import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ val sqlContext = new

How to calculate row by now and output retults in Spark

2015-10-19 Thread Shepherd
Hi all, I am new in Spark and Scala. I have a question in doing calculation.I am using "groupBy" to generate key value pair, and the value points to a subset of original RDD. The RDD has four columns, and each subset RDD may have different number of rows.For example, the original code like

Filter RDD

2015-10-19 Thread Shepherd
Hi all, I have a very simple question. I have a RDD, saying r1, which contains 5 columns, with both string and Int. How can I get a sub RDD, based on a rule, that the second column equals to a string (s)? Thanks a lot. -- View this message in context:

Question of RDD in calculation

2015-10-16 Thread Shepherd
Hi all,I am new in Spark, and I have a question in dealing with RDD.I’ve converted RDD to DataFrame. So there are two DF: DF1 and DF2DF1 contains: userID, time, dataUsage, durationDF2 contains: userIDEach userID has multiple rows in DF1.DF2 has distinct userID, and I would like to compute the