Re: [PR] [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions [spark]

2024-05-08 Thread via GitHub
zhouyifan279 commented on PR #41628: URL: https://github.com/apache/spark/pull/41628#issuecomment-2100506618 @jeanlyn > 2. Use partitionPaths to get matchingPartitions, then get customPartitionLocations like what we do in this PR. We can create a function `patitionPaths:

Re: [PR] [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions [spark]

2024-05-08 Thread via GitHub
jeanlyn commented on PR #41628: URL: https://github.com/apache/spark/pull/41628#issuecomment-2100456683 @zhouyifan279 , I think the solution you mention can solve the data inconsistency for custom partitions. But i don't know it's acceptable to communicate to metastore in such low

Re: [PR] [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions [spark]

2024-05-08 Thread via GitHub
zhouyifan279 commented on PR #41628: URL: https://github.com/apache/spark/pull/41628#issuecomment-2100365610 To eliminate data inconsistency issue, we should handle custom partitions in `HadoopMapReduceCommitProtocol.commitJob` instead of writing to the final output path then moving

Re: [PR] [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions [spark]

2024-01-22 Thread via GitHub
bowenliang123 commented on PR #41628: URL: https://github.com/apache/spark/pull/41628#issuecomment-1903692866 +1 for this PR. This PR is worth a second chance to be re-evaluated and it is a reasonable approach to reduce the number of requests and request pressure to Hive metastore service.

Re: [PR] [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions [spark]

2024-01-22 Thread via GitHub
jeanlyn commented on PR #41628: URL: https://github.com/apache/spark/pull/41628#issuecomment-1903655119 Hi, @npsables , we have using this patch running in our production for several months, you can try this patch safely. I don't think the community will merge this patch, because the

Re: [PR] [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions [spark]

2024-01-22 Thread via GitHub
npsables commented on PR #41628: URL: https://github.com/apache/spark/pull/41628#issuecomment-1903626788 Hi guys, will this ever get merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions [spark]

2023-11-12 Thread via GitHub
github-actions[bot] closed pull request #41628: [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions URL: https://github.com/apache/spark/pull/41628 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions [spark]

2023-11-11 Thread via GitHub
github-actions[bot] commented on PR #41628: URL: https://github.com/apache/spark/pull/41628#issuecomment-1806954903 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.