Re: [PR] fix(11397): surface proper errors in ParquetSink [datafusion]

2024-07-12 Thread via GitHub


alamb merged PR #11399:
URL: https://github.com/apache/datafusion/pull/11399


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix(11397): surface proper errors in ParquetSink [datafusion]

2024-07-11 Thread via GitHub


comphead commented on PR #11399:
URL: https://github.com/apache/datafusion/pull/11399#issuecomment-2223171934

   I agree with @alamb early return is way better than ignoring error.
   Ignoring error leads to unpredictable behavior


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix(11397): surface proper errors in ParquetSink [datafusion]

2024-07-10 Thread via GitHub


alamb commented on code in PR #11399:
URL: https://github.com/apache/datafusion/pull/11399#discussion_r1672835991


##
datafusion/core/src/datasource/file_format/parquet.rs:
##
@@ -893,12 +893,9 @@ async fn send_arrays_to_col_writers(
 let mut next_channel = 0;
 for (array, field) in rb.columns().iter().zip(schema.fields()) {
 for c in compute_leaves(field, array)? {
-col_array_channels[next_channel]
-.send(c)
-.await
-.map_err(|_| {
-DataFusionError::Internal("Unable to send array to 
writer!".into())
-})?;
+// Do not surface error from closed channel.
+let _ = col_array_channels[next_channel].send(c).await;
+

Review Comment:
   I agree if we just ignore the error here I think this code will continue to 
furiously encode data only to throw it away. If it hits an error I think it 
should return early 
   
   In general, I don't there is any useful error message to be made if the 
channel errors on write. It happens when the other end of the channel has "hung 
up" (aka there is no receiver that will ever receive the message)
   
   SO in other words, I think this function needs to stop and return control if 
encounters an error sending.  
   
   Something like 
   
   ```rust
   // Do not surface error from closed channel (means something 
   // else hit an error, and the plan is shutting down).
   if let Err(_) = col_array_channels[next_channel].send(c).await {
   return Ok(());
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix(11397): surface proper errors in ParquetSink [datafusion]

2024-07-10 Thread via GitHub


wiedld commented on code in PR #11399:
URL: https://github.com/apache/datafusion/pull/11399#discussion_r1672791309


##
datafusion/core/src/datasource/file_format/parquet.rs:
##
@@ -893,12 +893,9 @@ async fn send_arrays_to_col_writers(
 let mut next_channel = 0;
 for (array, field) in rb.columns().iter().zip(schema.fields()) {
 for c in compute_leaves(field, array)? {
-col_array_channels[next_channel]
-.send(c)
-.await
-.map_err(|_| {
-DataFusionError::Internal("Unable to send array to 
writer!".into())
-})?;
+// Do not surface error from closed channel.
+let _ = col_array_channels[next_channel].send(c).await;
+

Review Comment:
   The tradeoff is that we don't error early, and extra future polling and 
message passing attempts will occur. I can add more code to remove that, if you 
want.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix(11397): surface proper errors in ParquetSink [datafusion]

2024-07-10 Thread via GitHub


wiedld commented on code in PR #11399:
URL: https://github.com/apache/datafusion/pull/11399#discussion_r1672791309


##
datafusion/core/src/datasource/file_format/parquet.rs:
##
@@ -893,12 +893,9 @@ async fn send_arrays_to_col_writers(
 let mut next_channel = 0;
 for (array, field) in rb.columns().iter().zip(schema.fields()) {
 for c in compute_leaves(field, array)? {
-col_array_channels[next_channel]
-.send(c)
-.await
-.map_err(|_| {
-DataFusionError::Internal("Unable to send array to 
writer!".into())
-})?;
+// Do not surface error from closed channel.
+let _ = col_array_channels[next_channel].send(c).await;
+

Review Comment:
   The tradeoff is that we don't error early, and extra future polling and 
attempted messages passing will occur.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix(11397): surface proper errors in ParquetSink [datafusion]

2024-07-10 Thread via GitHub


wiedld commented on code in PR #11399:
URL: https://github.com/apache/datafusion/pull/11399#discussion_r1672791309


##
datafusion/core/src/datasource/file_format/parquet.rs:
##
@@ -893,12 +893,9 @@ async fn send_arrays_to_col_writers(
 let mut next_channel = 0;
 for (array, field) in rb.columns().iter().zip(schema.fields()) {
 for c in compute_leaves(field, array)? {
-col_array_channels[next_channel]
-.send(c)
-.await
-.map_err(|_| {
-DataFusionError::Internal("Unable to send array to 
writer!".into())
-})?;
+// Do not surface error from closed channel.
+let _ = col_array_channels[next_channel].send(c).await;
+

Review Comment:
   The tradeoff is that we don't error early, and extra future polling and 
attempted message passing will occur. I can add more code to remove that, if 
you want.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix(11397): surface proper errors in ParquetSink [datafusion]

2024-07-10 Thread via GitHub


wiedld commented on code in PR #11399:
URL: https://github.com/apache/datafusion/pull/11399#discussion_r1672791309


##
datafusion/core/src/datasource/file_format/parquet.rs:
##
@@ -893,12 +893,9 @@ async fn send_arrays_to_col_writers(
 let mut next_channel = 0;
 for (array, field) in rb.columns().iter().zip(schema.fields()) {
 for c in compute_leaves(field, array)? {
-col_array_channels[next_channel]
-.send(c)
-.await
-.map_err(|_| {
-DataFusionError::Internal("Unable to send array to 
writer!".into())
-})?;
+// Do not surface error from closed channel.
+let _ = col_array_channels[next_channel].send(c).await;
+

Review Comment:
   The tradeoff is that we don't error early and extra messages may attempt to 
be passed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org