Hi Udit,
You can use the scripts provided by Yaswanth for reading/writing the hudi 
dataset using pyspark.

I need to understand your requirements little bit more to add formal support. 

Are you looking for a python command-line tool similar to deltastreamer 
(https://hudi.apache.org/docs/writing_data.html#deltastreamer) for both hudi 
reader/writer
or interested in using Data Source APIs like

hudiOpts = { 
    "hoodie.datasource.write.recordkey.field": "uuid", 
    "hoodie.datasource.write.precombine.field": "update_timestamp", 
    "hoodie.datasource.write.operation": "upsert",
    "hoodie.table.name": "tmp.stock_ticker"
} 
basePath = "/tmp/stock_ticker"
inputDF.write.format("org.apache.hudi")
       .options(**hudiOpts)
       .mode("Append")
       .save(basePath)

basePath = "/tmp/stock_ticker/*"
outputDF = inputDF.read.format("org.apache.hudi").load(basePath)

Thanks,
Vinoth


On 2020/04/09 00:39:49, Vinoth Chandar <vin...@apache.org> wrote: 
> Thanks Udit!  I also believe there will be a PR soon for pySpark and we
> should have formal support next release.
> 
> 
> 
> On Wed, Apr 8, 2020 at 4:49 PM Mehrotra, Udit <udi...@amazon.com.invalid>
> wrote:
> 
> > Hi Yaswanth,
> >
> > PFA an example I prepared sometime back which can help you get started.
> >
> > Thanks,
> > Udit
> >
> > On 4/8/20, 3:21 PM, "Atluri Yaswanth" <yaswanth.atl...@gmail.com> wrote:
> >
> >     CAUTION: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the sender and
> > know the content is safe.
> >
> >
> >
> >     Hi Team,
> >
> >     I would like to know are there any scripts in PySpark to upsert the
> > data in
> >     hudi dataset.
> >
> >     I am working with Scala now, but i want to use Pyspark as my data is
> > not in
> >     good format(i need to use various libraries inside).
> >
> >     Thanks in advance
> >     yaswanth
> >
> >
> >
> 

Reply via email to