Fwd: iceberg queries

2023-06-15 Thread Gaurav Agarwal
Hi Team, Sample Merge query: df.createOrReplaceTempView("source") MERGE INTO iceberg_hive_cat.iceberg_poc_db.iceberg_tab target USING (SELECT * FROM source) ON target.col1 = source.col1// this is my bucket column WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * The source dataset

Re: Spark using iceberg

2023-06-15 Thread Gaurav Agarwal
> HI > > I am using spark with iceberg, updating the table with 1700 columns , > We are loading 0.6 Million rows from parquet files ,in future it will be > 16 Million rows and trying to update the data in the table which has 16 > buckets . > Using the default partitioner of spark .Also we don't do

Spark using iceberg

2023-06-15 Thread Gaurav Agarwal
HI I am using spark with iceberg, updating the table with 1700 columns , We are loading 0.6 Million rows from parquet files ,in future it will be 16 Million rows and trying to update the data in the table which has 16 buckets . Using the default partitioner of spark .Also we don't do any

Re: OFFICIAL USA REPORT TODAY India Most Dangerous : USA Religious Freedom Report out TODAY

2020-04-29 Thread Gaurav Agarwal
Spark moderator supress this user please. Unnecessary Spam or apache spark account is hacked ? On Wed, Apr 29, 2020, 11:56 AM Zahid Amin wrote: > How can it be rumours ? > Of course you want to suppress me. > Suppress USA official Report out TODAY . > > > Sent: Wednesday, April 29, 2020 at

Re: Enrichment with static tables

2017-02-16 Thread Gaurav Agarwal
r do a select on the columns you want to produce your transformed > dataframe > > Not sure if I understand the question though, If the goal is just an end > state transformed dataframe that can easily be done > > > Regards > Sam > > On Wed, Feb 15, 2017 at 6:34 PM, G

Enrichment with static tables

2017-02-15 Thread Gaurav Agarwal
Hello We want to enrich our spark RDD loaded with multiple Columns and multiple Rows . This need to be enriched with 3 different tables that i loaded 3 different spark dataframe . Can we write some logic in spark so i can enrich my spark RDD with different stattic tables. Thanks

Regarding transformation with dataframe

2017-02-15 Thread Gaurav Agarwal
Hello I have loaded 3 dataframes with 3 different Static tables. Now i got the csv file and with the help of Spark i loaded the csv into dataframe and named it as temporary table as "Employee". Now i need to enrich the columns in the Employee DF and query any of 3 static table respectively with

Re: Spark on Windows platform

2016-02-29 Thread Gaurav Agarwal
> Hi > I am running spark on windows but a standalone one. > > Use this code > > SparkConf conf = new SparkConf().setMaster("local[1]").seatAppName("spark").setSparkHome("c:/spark/bin/spark-submit.cmd"); > > Where sparkhome is the path where u extracted ur spark binaries till bin/*.cmd > > You

Re: Stored proc with spark

2016-02-16 Thread Gaurav Agarwal
s.exit() > > > > HTH > > -- > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > > NOTE: The information in this emai

Stored proc with spark

2016-02-16 Thread Gaurav Agarwal
Hi Can I load the data into spark from oracle storedproc Thanks

Dataframes

2016-02-11 Thread Gaurav Agarwal
Hi Can we load 5 data frame for 5 tables in one spark context. I am asking why because we have to give Map options= new hashmap(); Options.put(driver,""); Options.put(URL,""); Options.put(dbtable,""); I can give only table query at time in dbtable options . How will I register multiple queries

Re: cache DataFrame

2016-02-11 Thread Gaurav Agarwal
tes the optimized execution plan, according to the query execution > tree. This is the point when data gets* materialized > */ > > > On Feb 11, 2016, at 11:20 AM, Gaurav Agarwal <gaurav130...@gmail.com> > wrote: > > > > Hi > &g

cache DataFrame

2016-02-11 Thread Gaurav Agarwal
Hi When the dataFrame will load the table into memory when it reads from HIVe/Phoenix or from any database. These are two points where need one info , when tables will be loaded into memory or cached when at point 1 or point 2 below. 1. DataFrame df = sContext.load("jdbc","(select * from

Re: Problem using limit clause in spark sql

2015-12-23 Thread Gaurav Agarwal
I am going to have the above scenario without using limit clause then will it work check among all the partitions. On Dec 24, 2015 9:26 AM, "汪洋" wrote: > I see. > > Thanks. > > > 在 2015年12月24日,上午11:44,Zhan Zhang 写道: > > There has to have a central

Spark data frame

2015-12-22 Thread Gaurav Agarwal
We are able to retrieve data frame by filtering the rdd object . I need to convert that data frame into java pojo. Any idea how to do that

Regarding spark in nemory

2015-12-22 Thread Gaurav Agarwal
If I have 3 more cluster and spark is running there .if I load the records from phoenix to spark rdd and fetch the records from the spark through data frame. Now I want to know that spark is distributed? So I fetch the records from any of the node, records will be retrieved present on any node

sparkStreaming how to work with partitions,how tp create partition

2015-08-22 Thread Gaurav Agarwal
1. how to work with partition in spark streaming from kafka 2. how to create partition in spark streaming from kafka when i send the message from kafka topic having three partitions. Spark will listen the message when i say kafkautils.createStream or createDirectstSream have local[4] Now i

Re: spark kafka partitioning

2015-08-21 Thread Gaurav Agarwal
of spark api i have to see to find out On 8/21/15, Gaurav Agarwal gaurav130...@gmail.com wrote: Hello Regarding Spark Streaming and Kafka Partitioning When i send message on kafka topic with 3 partitions and listens on kafkareceiver with local value[4] . how will i come to know in Spark

spark kafka partitioning

2015-08-20 Thread Gaurav Agarwal
Hello Regarding Spark Streaming and Kafka Partitioning When i send message on kafka topic with 3 partitions and listens on kafkareceiver with local value[4] . how will i come to know in Spark Streaming that different Dstreams are created according to partitions of kafka messages . Thanks