subject:"\[GitHub\] \[hudi\] JoshuaZhuCN commented on issue #7322\: \[SUPPORT\]\[HELP\] SparkSQL can not read the latest change data without execute \"refresh table xxx\""

[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-07 Thread GitBox



JoshuaZhuCN commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1340612815

   @alexeykudinkin  Hi，is there any way to solve this problem, such as closing 
some parameter settings? In addition to manually refreshing the table (because 
this means that a large number of business codes need to be modified), there is 
no temporary closure measure. The bug that must be solved by modifying the 
business code is a destructive defect, which directly affects the upgrade and 
use. This problem should be repaired through the patch version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox



JoshuaZhuCN commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350194639

   > @JoshuaZhuCN the reason why it's not refreshing is b/c you're writing into 
a table by path and not by table identifier (ie `db.table`) -- therefore it 
bypasses the Catalog and goes straight to Hudi's Data Source which is not going 
to be refreshing the Spark caches.
   > 
   > This is not an issue and is an expected behavior. To address try writing 
into the table by its id (which in turn will involve the catalog refresh)
   
   @alexeykudinkin i  don`t understand what "write into the table by its id" 
means, just using sql like `insert into/update/delete from db.table` to write 
date?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox



JoshuaZhuCN commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350206823

   @alexeykudinkin  At present, the problem I encounter is not only that the 
Spark datasource cannot be read after it is written, but also that the Spark 
sql cannot be read after it is written by Flink using hive sync. In other 
words, the SparkSQL query can not immediately read new data in any other way 
except by writing data in SQL. Therefore, I think this is a problem that needs 
to be solved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox



JoshuaZhuCN commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350311469

   > > @alexeykudinkin i don't understand what "write into the table by its id" 
means, just using sql like insert into/update/delete from db.table to write 
data?
   > 
   > Correct. You can do the same from Spark DS.
   > 
   
   @alexeykudinkin I think the query engine should not limit the writing method 
for querying data. Even for the tables created by Spakrsql, the query engine 
should be able to query new data regardless of the way in which the data is 
written in the spark datasource, spark sql, java client, flash sql, and flash 
stream apis, without requiring users to do additional operations for different 
writing methods when using the query engine
   > > @alexeykudinkin At present, the problem I encounter is not only that the 
Spark datasource cannot be read after it is written, but also that the Spark 
sql cannot be read after it is written by Flink using hive sync. In other 
words, the SparkSQL query can not immediately read new data in any other way 
except by writing data in SQL. Therefore, I think this is a problem that needs 
to be solved
   > 
   > Interesting. Can you please create another issue specifically for this one 
as this hardly could be related?
   I'll verify it again.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox



JoshuaZhuCN commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350344575

   > > @alexeykudinkin At present, the problem I encounter is not only that the 
Spark datasource cannot be read after it is written, but also that the Spark 
sql cannot be read after it is written by Flink using hive sync. In other 
words, the SparkSQL query can not immediately read new data in any other way 
except by writing data in SQL. Therefore, I think this is a problem that needs 
to be solved
   > 
   > Interesting. Can you please create another issue specifically for this one 
as this hardly could be related?
   
   @alexeykudinkin here https://github.com/apache/hudi/issues/7452


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

5 matches

Site Navigation

Mail list logo

Footer information