Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-19 Thread JackyLee
+1 -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: spark-on-k8s is still experimental?

2020-08-03 Thread JackyLee
+1. It has been worked well in our company and we has used it to support online services since March in this year. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail:

Re: Catalog API for Partition

2020-07-21 Thread JackyLee
The `partitioning` in `TableCatalog.createTable` is a partition schema for table, which doesn't contains the partition metadata for an actual partition. Besides, the actual partition metadata may contains many partition schema, such as hive partition. Thus I created a `TablePartition` to contains

Re: Catalog API for Partition

2020-07-17 Thread JackyLee
Hi, wenchen. Thanks for your attention and reply. Firstly. These Partition Catalog APIs are not specially used for hive, they can be used with LakeHouse or myql or other source support partitions. Secondly. These Partition Catalog APIs are only designed for better data management, not for speed

Catalog API for Partition

2020-07-16 Thread JackyLee
Hi devs, In order to support Partition Commands for datasourcev2 and Lakehouse, I'm trying to add Partition API for multiple Catalog. They are widely used APIs in mysql or hive or other datasources, we can use these API to mange Partition metadata in Lakehouse. JIRA:

Resolve _temporary directory uncleaned

2020-07-16 Thread JackyLee
hi devs, In InsertIntoHiveTable and InsertIntoHiveDirCommand, we use deleteExternalTmpPath to clean temporary directories after Job committed and cancel deleteOnExit if succeeded.But sometimes (e.g., when speculative task is enabled), temporary directories may be left uncleaned. This is happened

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread JackyLee
Thank you for putting forward this. Can we put the support of view and partition catalog in version 3.1? AFAIT, these are great features in DSv2 and Catalog. With these, we can work well with warehouse, such as delta or hive. https://github.com/apache/spark/pull/28147

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread JackyLee
+1. Agree with Xiao Li and Jungtaek Lim. This seems to be controversial, and can not be done in a short time. It is necessary to choose option 1 to unblock Spark 3.0 and support it in 3.1. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Re: [DISCUSS] Supporting hive on DataSourceV2

2020-03-24 Thread JackyLee
Hi Blue, I have created a jira for supporting hive on DataSourceV2,we can associate specific modules on this jira. https://issues.apache.org/jira/browse/SPARK-31241 Could you provide a google doc for current design, so that we can discuss and improve it in detail here? -- Sent from:

Re: [DISCUSS] Supporting hive on DataSourceV2

2020-03-24 Thread JackyLee
Glad to hear that you have already supported it, that is just the thing we are doing. And these exceptions you said doesn't conflict with hive support, we can easily make it compatible. >Do you have an idea about where the connector should be developed? I don’t think it makes sense for it to be

[DISCUSS] Supporting hive on DataSourceV2

2020-03-23 Thread JackyLee
Hi devs, I’d like to start a discussion about Supporting Hive on DatasourceV2. We’re now working on a project using DataSourceV2 to provide multiple source support and it works with the data lake solution very well, yet it does not yet support HiveTable. There are 3 reasons why we need to support

Question about spark on k8s

2020-01-03 Thread JackyLee
Hello, devs. In our scenario, we run spark on Kata-like containers, and found the code had written the Kube-DNS domain. If Kube-DNS is not configured in environment, tasks would run failed. My question is, why we wrote the domain name of Kube-DNS in the code? Isn't it better to read domain name

Re: [VOTE] SPIP: Identifiers for multi-catalog Spark

2019-02-19 Thread JackyLee
+1 -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Welcome Jose Torres as a Spark committer

2019-01-29 Thread JackyLee
Congrats, Joe! Best, Jacky -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread JackyLee
Agree with rxin. Maybe we should consider about these PRs, especially those large PRs, after DataSource V2 API is ready. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail:

Re: Support SqlStreaming in spark

2018-12-27 Thread JackyLee
Hi, Wenchen Thank you for your recognition of Streaming on sql. I have written the SQLStreaming design document: https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit# Your Questions are answered in here:

Re: Support SqlStreaming in spark

2018-12-25 Thread JackyLee
No problem -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Support SqlStreaming in spark

2018-12-21 Thread JackyLee
Hi wenchen I have been working at SQLStreaming for a year, and I have promoted it in company. I have seen the design for Kafka or the Calcite, and I believe my design is better than them. They support pure-SQL not table API for streaming. Users can only use the specified Streaming

Re: Support SqlStreaming in spark

2018-12-21 Thread JackyLee
Hi wenchen and Arun Mahadevan Thanks for your reply. SQLStreaming is not just a way to support pure-SQL, but also a way to define table api for Streaming. I have redefined the SQLStreaming to make it support table API. User can use sql or table API to run SQLStreaming. I will

Why use EMPTY_DATA_SCHEMA when creating a datasource table

2018-12-17 Thread JackyLee
Hi, everyone I have some questions about creating a datasource table. In HiveExternalCatalog.createDataSourceTable, newSparkSQLSpecificMetastoreTable will replace the table schema with EMPTY_DATA_SCHEMA and table.partitionSchema. So,Why we use EMPTY_DATA_SCHEMA? Why not declare schema

Re: Public v2 interface location

2018-11-30 Thread JackyLee
Hi, Ryan Blue. I don't think it would be a good idea to add the sql-api module. I prefer to add sql-api to sql/core. The sql is just another representation of dataset, thus there is no need to add new module to do this. Besides, it would be easier to add sql-api in core. By the way, I don't

Re: DataSourceV2 community sync #3

2018-11-27 Thread JackyLee
+1 Please add me to the Google Hangout invite. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: DataSourceV2 capability API

2018-11-12 Thread JackyLee
I don't know if it is a right thing to make table API as ContinuousScanBuilder -> ContinuousScan -> ContinuousBatch, it makes batch/microBatch/Continuous too different from each other. In my opinion, these are basically similar at the table level. So is it possible to design an API like this?

Re: Plan on Structured Streaming in next major/minor release?

2018-11-04 Thread JackyLee
Can these things be added into this list? 1. [SPARK-24630] Support SQLStreaming in Spark This patch defines the Table API for StructStreaming 2. [SPARK-25937] Support user-defined schema in Kafka Source & Sink This patch make user easier to work with StructStreaming 3. SS supports

Re: Support SqlStreaming in spark

2018-10-21 Thread JackyLee
The code of SQLStreaming has been pushed: https://github.com/apache/spark/pull/22575 -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: data source api v2 refactoring

2018-10-21 Thread JackyLee
I have pushed a patch for SQLStreaming, which just resolved the problem just discussed. the Jira: https://issues.apache.org/jira/browse/SPARK-24630 the Patch: https://github.com/apache/spark/pull/22575 SQLStreaming just defined the table API for StructStreaming, and the Table APIs for

Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread JackyLee
Thanks for raising them. FYI, I believe this open issues could also be considered: https://issues.apache.org/jira/browse/SPARK-24630 An new ability to express Struct Streaming on pure SQL. -- Sent from:

Re: Support SqlStreaming in spark

2018-06-28 Thread JackyLee
Spark JIRA: https://issues.apache.org/jira/projects/SPARK/issues/SPARK-24630 Benefits: Firstly, users, who are unfamiliar with streaming, can easily use SQL to run StructStreaming especially when migrating offline tasks to real time processing tasks. Secondly, support SQL API in StructStreaming

Support SqlStreaming in spark

2018-06-14 Thread JackyLee
Hello Nowadays, more and more streaming products begin to support SQL streaming, such as KafaSQL, Flink SQL and Storm SQL. To support SQL Streaming can not only reduce the threshold of streaming, but also make streaming easier to be accepted by everyone. At present, StructStreaming is