[GitHub] [hudi] nsivabalan commented on issue #2586: [SUPPORT] - How to guarantee snapshot isolation when reading Hudi tables in S3?

2021-04-02 Thread GitBox


nsivabalan commented on issue #2586:
URL: https://github.com/apache/hudi/issues/2586#issuecomment-812712095


   Closing this for now. please feel free to reopen or open a new ticket. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2586: [SUPPORT] - How to guarantee snapshot isolation when reading Hudi tables in S3?

2021-03-30 Thread GitBox


nsivabalan commented on issue #2586:
URL: https://github.com/apache/hudi/issues/2586#issuecomment-810428893


   once you respond and have any questions/clarifications, can you please 
remove "awaiting-user-response" label for the issue. If possible add 
"awaiting-community-help" label. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2586: [SUPPORT] - How to guarantee snapshot isolation when reading Hudi tables in S3?

2021-02-24 Thread GitBox


nsivabalan commented on issue #2586:
URL: https://github.com/apache/hudi/issues/2586#issuecomment-785510794


   Few options/questions:
   - does your incremental ingestion contains updates or inserts? If they are 
just inserts but Hudi's file sizing optimization joins w/ existing files, we 
can try turning off the file sizing. 
   - In general, you can set the file versions retained based on max time for 
read query and max ingestions that could happen within that time frame. For eg, 
if you read query could take a max of 2 hours and you ingest once every 10 mins 
to hudi, you can set the file versions retained to 12. 
   - Also, another option is to try MERGE_ON_READ table. here, hudi will just 
do delta commits which may not incur much write amplification as compared to 
COW. You can set file versions retained to 3 itself. delta commits doesn't come 
in the way of min file versions retained. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2586: [SUPPORT] - How to guarantee snapshot isolation when reading Hudi tables in S3?

2021-02-24 Thread GitBox


nsivabalan commented on issue #2586:
URL: https://github.com/apache/hudi/issues/2586#issuecomment-785128911


   @bvaradar @n3nash : let me take a stab. let me know if my understanding is 
right. 
   customers sets file versions retained to 1. 
   So, if there are two writes by the time a single lengthy query completes, we 
could encounter this situation where in the query could throw 
FileNotFoundException as 2nd write would have deleted the 1st file version for 
all data files? 
   
   @Rap70r : in the mean time, do you want to set file versions retained to be 
1 necessarily? if not, can you try setting it to 3 and let us know if you could 
still reproduce the issue. 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2586: [SUPPORT] - How to guarantee snapshot isolation when reading Hudi tables in S3?

2021-02-21 Thread GitBox


nsivabalan commented on issue #2586:
URL: https://github.com/apache/hudi/issues/2586#issuecomment-782850685


   Hudi follows MVCC and hence there is isolation between writers and readers. 
You should not see any such issues.
   - "if any process updates the table under S3:. by this you mean, if you 
update Hudi dataset via spark data source/deltastreamer etc is it? Or by some 
other means 
   - Can you post the stack trace you see. w/o any logs, going to be tough to 
debug this.
   - Can you post your configs you use to write and read from hudi. 
   - I assume you have just one writer at any point in time. Can you please 
confirm that.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org