Re: Re: spark streaming and kinesis integration

2023-04-11 Thread Lingzhe Sun
Hi Mich,

FYI we're using spark 
operator(https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) to build 
stateful structured streaming on k8s for a year. Haven't test it using 
non-operator way.

Besides that, the main contributor of the spark operator, Yinan Li, has been 
inactive for quite long time. Kind of worried that this project might finally 
become outdated as k8s is evolving. So if anyone is interested, please support 
the project.



Lingzhe Sun
Hirain Technologies
 
From: Mich Talebzadeh
Date: 2023-04-11 02:06
To: Rajesh Katkar
CC: user
Subject: Re: spark streaming and kinesis integration
What I said was this
"In so far as I know k8s does not support spark structured streaming?"

So it is an open question. I just recalled it. I have not tested myself. I know 
structured streaming works on Google Dataproc cluster but I have not seen any 
official link that says Spark Structured Streaming is supported on k8s.

HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies
London
United Kingdom

   view my Linkedin profile

 https://en.everybodywiki.com/Mich_Talebzadeh
 
Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 
 


On Mon, 10 Apr 2023 at 06:31, Rajesh Katkar  wrote:
Do you have any link or ticket which justifies that k8s does not support spark 
streaming ?

On Thu, 6 Apr, 2023, 9:15 pm Mich Talebzadeh,  wrote:
Do you have a high level diagram of the proposed solution?

In so far as I know k8s does not support spark structured streaming?

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies
London
United Kingdom

   view my Linkedin profile

 https://en.everybodywiki.com/Mich_Talebzadeh
 
Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 
 


On Thu, 6 Apr 2023 at 16:40, Rajesh Katkar  wrote:
Use case is , we want to read/write to kinesis streams using k8s
Officially I could not find the connector or reader for kinesis from spark like 
it has for kafka.

Checking here if anyone used kinesis and spark streaming combination ?

On Thu, 6 Apr, 2023, 7:23 pm Mich Talebzadeh,  wrote:
Hi Rajesh,

What is the use case for Kinesis here? I have not used it personally, Which use 
case it concerns

https://aws.amazon.com/kinesis/

Can you use something else instead?

HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies
London
United Kingdom

   view my Linkedin profile

 https://en.everybodywiki.com/Mich_Talebzadeh
 
Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 
 


On Thu, 6 Apr 2023 at 13:08, Rajesh Katkar  wrote:
Hi Spark Team,
We need to read/write the kinesis streams using spark streaming.
 We checked the official documentation - 
https://spark.apache.org/docs/latest/streaming-kinesis-integration.html
It does not mention kinesis connector. Alternative is - 
https://github.com/qubole/kinesis-sql which is not active now.  This is now 
handed over here - https://github.com/roncemer/spark-sql-kinesis
Also according to SPARK-18165 , Spark officially do not have any kinesis 
connector 
We have few below questions , It would be great if you can answer 
Does Spark provides officially any kinesis connector which have 
readstream/writestream and endorse any connector for production use cases ?  
https://spark.apache.org/docs/latest/streaming-kinesis-integration.html This 
documentation does not mention how to write to kinesis. This method has default 
dynamodb as checkpoint, can we override it ?
We have rocksdb as a state store but when we ran an application using official  
https://spark.apache.org/docs/latest/streaming-kinesis-integration.html rocksdb 
configurations were not effective. Can you please confirm if rocksdb is not 
applicable in these cases?
rocksdb however works with qubole connector , do you have any plan to release 
kinesis connector?
Please help/recommend us for any good stable kinesis connector or some pointers 
around it


Re: [SparkSQL, SparkUI, RESTAPI] How to extract the WholeStageCodeGen ids from SparkUI

2023-04-11 Thread Chitral Verma
try explain codegen on your DF and then pardee the string

On Fri, 7 Apr, 2023, 3:53 pm Chenghao Lyu,  wrote:

> Hi,
>
> The detailed stage page shows the involved WholeStageCodegen Ids in its
> DAG visualization from the Spark UI when running a SparkSQL. (e.g., under
> the link
> node:18088/history/application_1663600377480_62091/stages/stage/?id=1=0).
>
> However, I have trouble extracting the WholeStageCodegen ids from the DAG
> visualization via the RESTAPIs. Is there any other way to get the
> WholeStageCodegen Ids information for each stage automatically?
>
> Cheers,
> Chenghao
>


Re: Non string type partitions

2023-04-11 Thread Chitral Verma
Because the name of the directory cannot be an object, it has to be a
string to create partitioned dirs like "date=2023-04-10"

On Tue, 11 Apr, 2023, 8:27 pm Charles vinodh,  wrote:

>
> Hi Team,
>
> We are running into the below error when we are trying to run a simple
> query a partitioned table in Spark.
>
> *MetaException(message:Filtering is supported only on partition keys of type 
> string)
> *
>
>
> Our the partition column has been to type *date *instead of string and
> query is a very simple SQL as shown below.
>
> *SELECT * FROM my_table WHERE partition_col = date '2023-04-11'*
>
> Any idea why spark mandates partition columns to be of type string?. Is
> there a recommended work around for this issue?
>
>
>


Non string type partitions

2023-04-11 Thread Charles vinodh
Hi Team,

We are running into the below error when we are trying to run a simple
query a partitioned table in Spark.

*MetaException(message:Filtering is supported only on partition keys
of type string)
*


Our the partition column has been to type *date *instead of string and
query is a very simple SQL as shown below.

*SELECT * FROM my_table WHERE partition_col = date '2023-04-11'*

Any idea why spark mandates partition columns to be of type string?. Is
there a recommended work around for this issue?