[jira] [Updated] (SPARK-23077) Apache Structured Streaming: Bulk/Batch write support for Hive using streaming dataset

Pravin Agrawal (JIRA) Mon, 15 Jan 2018 03:46:09 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-23077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Pravin Agrawal updated SPARK-23077:
-----------------------------------
    Description: 
Using Apache Spark 2.2: Structured Streaming, Create a program which reads data 
from Kafka and write it to Hive.
Data incoming from Kafka topic @ 100 records/sec and write to Hive table.

**Hive Table Created:**

CREATE TABLE demo_user( timeaa BIGINT, numberbb INT, decimalcc DOUBLE, stringdd 
STRING, booleanee BOOLEAN ) STORED AS ORC ;

**Insert via Manual Hive Query:**

INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true);

**Insert via spark structured streaming code:**

SparkConf conf = new SparkConf();
 conf.setAppName("testing");
 conf.setMaster("local[2]");
 conf.set("hive.metastore.uris", "thrift://localhost:9083");
 SparkSession session = 
 SparkSession.builder().config(conf).enableHiveSupport().getOrCreate();

// workaround START: code to insert static data into hive
 String insertQuery = "INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 
'pravin', true)";
 session.sql(insertQuery);
 // workaround END:

// Solution START
 Dataset<Row> dataset = readFromKafka(sparkSession); // private method reading 
data from Kafka's 'xyz' topic

// some code which writes dataset into hive table demo_user
 // Solution END

  was:
Using Apache Spark 2.2: Structured Streaming, I am creating a program which 
reads data from Kafka and write it to Hive.
 I am looking for writing bulk data incoming in Kafka topic @ 100 records/sec.

**Hive Table Created:**

CREATE TABLE demo_user( timeaa BIGINT, numberbb INT, decimalcc DOUBLE, stringdd 
STRING, booleanee BOOLEAN ) STORED AS ORC ;

**Insert via Manual Hive Query:**

INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true);

**Insert via spark structured streaming code:**

SparkConf conf = new SparkConf();
 conf.setAppName("testing");
 conf.setMaster("local[2]");
 conf.set("hive.metastore.uris", "thrift://localhost:9083");
 SparkSession session = 
 SparkSession.builder().config(conf).enableHiveSupport().getOrCreate();

// workaround START: code to insert static data into hive
 String insertQuery = "INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 
'pravin', true)";
 session.sql(insertQuery);
 // workaround END:

// Solution START
 Dataset<Row> dataset = readFromKafka(sparkSession); // private method reading 
data from Kafka's 'xyz' topic

// some code which writes dataset into hive table demo_user
 // Solution END


> Apache Structured Streaming: Bulk/Batch write support for Hive using 
> streaming dataset
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-23077
>                 URL: https://issues.apache.org/jira/browse/SPARK-23077
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: Pravin Agrawal
>            Priority: Minor
>
> Using Apache Spark 2.2: Structured Streaming, Create a program which reads 
> data from Kafka and write it to Hive.
> Data incoming from Kafka topic @ 100 records/sec and write to Hive table.
> **Hive Table Created:**
> CREATE TABLE demo_user( timeaa BIGINT, numberbb INT, decimalcc DOUBLE, 
> stringdd STRING, booleanee BOOLEAN ) STORED AS ORC ;
> **Insert via Manual Hive Query:**
> INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true);
> **Insert via spark structured streaming code:**
> SparkConf conf = new SparkConf();
>  conf.setAppName("testing");
>  conf.setMaster("local[2]");
>  conf.set("hive.metastore.uris", "thrift://localhost:9083");
>  SparkSession session = 
>  SparkSession.builder().config(conf).enableHiveSupport().getOrCreate();
> // workaround START: code to insert static data into hive
>  String insertQuery = "INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 
> 'pravin', true)";
>  session.sql(insertQuery);
>  // workaround END:
> // Solution START
>  Dataset<Row> dataset = readFromKafka(sparkSession); // private method 
> reading data from Kafka's 'xyz' topic
> // some code which writes dataset into hive table demo_user
>  // Solution END



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23077) Apache Structured Streaming: Bulk/Batch write support for Hive using streaming dataset

Reply via email to