Re: Please review (ValidateExternalType should return child in error)
Hi Maksim, I would really appreciate it if you could review my PR [ https://github.com/apache/spark/pull/47522 ]. Would you mind taking a look at my changes? How can I improve my PR flow for a better view from the reviewer's perspective? On Sun, 25 Aug 2024 at 18:18, Mark Andreev wrote: > Hi Michael, > > I would really appreciate it if you could review my PR [ > https://github.com/apache/spark/pull/47522 ], as your expertise in the > SQL part of Apache Spark is invaluable. Would you mind taking a look at my > changes? > > > > On Sun, 25 Aug 2024 at 18:15, Mark Andreev wrote: > >> Thank you Bjørn. >> >> My PR [ https://github.com/apache/spark/pull/47522 ] was updated to be >> aligned with the guideline. >> >> + What changes were proposed in this pull request? >> + Why are the changes needed? >> + Does this PR introduce any user-facing change? >> + How was this patch tested? >> + Was this patch authored or co-authored using generative AI tooling? >> >> >> >> On Sun, 25 Aug 2024 at 15:47, Bjørn Jørgensen >> wrote: >> >>> Apache spark does have a template for PR's >>> https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE >>> >>> >>> søn. 25. aug. 2024 kl. 13:41 skrev Mich Talebzadeh < >>> mich.talebza...@gmail.com>: >>> >>>> Unfortunately it is not that straight forward >>>> >>>> >>>>1. Committer Votes: The PR needs a sufficient number of "+1" votes >>>>from *committers.* >>>>2. Review Process: Address feedback from the community and >>>>committers to ensure the PR meets the necessary standards. >>>>3. Approval: Once approved by committers, the PR can be merged into >>>>the main codebase. >>>> >>>> >>>> HTH >>>> >>>> >>>> >>>> On Sun, 25 Aug 2024 at 08:17, Mark Andreev >>>> wrote: >>>> >>>>> Thank you for your review. >>>>> >>>>> Could you explain how to merge this commit into the upstream? I don't >>>>> want this PR to be abandoned. >>>>> >>>>> Best regards, >>>>> Mark Andreev >>>>> >>>>> >>>>> On Wed, 21 Aug 2024 at 23:08, Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> Hi Mark, >>>>>> >>>>>> You have already done that and have made the request for review. >>>>>> >>>>>> +1 for me >>>>>> >>>>>> Mich Talebzadeh, >>>>>> >>>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>>>> College London >>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>>> London, United Kingdom >>>>>> >>>>>> >>>>>>view my Linkedin profile >>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>> >>>>>> >>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> *Disclaimer:* The information provided is correct to the best of my >>>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>>> expert opinions (Werner >>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>> >>>>>> >>>>>> On Wed, 21 Aug 2024 at 22:20, Mark Andreev >>>>>> wrote: >>>>>> >>>>>>> Thank you, Mich. >>>>>>> >>>>>>> What is the correct procedure to request a review? >>>>>>> >>>>>>> On Tue, 20 Aug 2024 at 22:57, Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Mark, >>>>>>>> >>>>>>>> Added a comment to Jira to provide more clarity to Description >>>>>>>&g
Re: Please review (ValidateExternalType should return child in error)
Hi Michael, I would really appreciate it if you could review my PR [ https://github.com/apache/spark/pull/47522 ], as your expertise in the SQL part of Apache Spark is invaluable. Would you mind taking a look at my changes? On Sun, 25 Aug 2024 at 18:15, Mark Andreev wrote: > Thank you Bjørn. > > My PR [ https://github.com/apache/spark/pull/47522 ] was updated to be > aligned with the guideline. > > + What changes were proposed in this pull request? > + Why are the changes needed? > + Does this PR introduce any user-facing change? > + How was this patch tested? > + Was this patch authored or co-authored using generative AI tooling? > > > > On Sun, 25 Aug 2024 at 15:47, Bjørn Jørgensen > wrote: > >> Apache spark does have a template for PR's >> https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE >> >> >> søn. 25. aug. 2024 kl. 13:41 skrev Mich Talebzadeh < >> mich.talebza...@gmail.com>: >> >>> Unfortunately it is not that straight forward >>> >>> >>>1. Committer Votes: The PR needs a sufficient number of "+1" votes >>>from *committers.* >>>2. Review Process: Address feedback from the community and >>>committers to ensure the PR meets the necessary standards. >>>3. Approval: Once approved by committers, the PR can be merged into >>>the main codebase. >>> >>> >>> HTH >>> >>> >>> >>> On Sun, 25 Aug 2024 at 08:17, Mark Andreev >>> wrote: >>> >>>> Thank you for your review. >>>> >>>> Could you explain how to merge this commit into the upstream? I don't >>>> want this PR to be abandoned. >>>> >>>> Best regards, >>>> Mark Andreev >>>> >>>> >>>> On Wed, 21 Aug 2024 at 23:08, Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Hi Mark, >>>>> >>>>> You have already done that and have made the request for review. >>>>> >>>>> +1 for me >>>>> >>>>> Mich Talebzadeh, >>>>> >>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>> >>>>> London, United Kingdom >>>>> >>>>> >>>>>view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* The information provided is correct to the best of my >>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>> expert opinions (Werner >>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>> >>>>> >>>>> On Wed, 21 Aug 2024 at 22:20, Mark Andreev >>>>> wrote: >>>>> >>>>>> Thank you, Mich. >>>>>> >>>>>> What is the correct procedure to request a review? >>>>>> >>>>>> On Tue, 20 Aug 2024 at 22:57, Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Hi Mark, >>>>>>> >>>>>>> Added a comment to Jira to provide more clarity to Description >>>>>>> >>>>>>> When encountering mixed schema rows, the current error message >>>>>>> "{actual} is not a valid external type for schema of {expected}" lacks >>>>>>> sufficient detail to identify the problematic column. This ambiguity >>>>>>> hinders troubleshooting and increases development time. >>>>>>> >>>>>>> To enhance error clarity, we propose incorporating the source column >>>>>>> name into the error message. For example: "Column 'my_column' has an >>>>>>> actual >>>>>>> type of {actual} which is not a valid external t
Re: Please review (ValidateExternalType should return child in error)
Thank you Bjørn. My PR [ https://github.com/apache/spark/pull/47522 ] was updated to be aligned with the guideline. + What changes were proposed in this pull request? + Why are the changes needed? + Does this PR introduce any user-facing change? + How was this patch tested? + Was this patch authored or co-authored using generative AI tooling? On Sun, 25 Aug 2024 at 15:47, Bjørn Jørgensen wrote: > Apache spark does have a template for PR's > https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE > > søn. 25. aug. 2024 kl. 13:41 skrev Mich Talebzadeh < > mich.talebza...@gmail.com>: > >> Unfortunately it is not that straight forward >> >> >>1. Committer Votes: The PR needs a sufficient number of "+1" votes >>from *committers.* >>2. Review Process: Address feedback from the community and committers >>to ensure the PR meets the necessary standards. >>3. Approval: Once approved by committers, the PR can be merged into >>the main codebase. >> >> >> HTH >> >> >> >> On Sun, 25 Aug 2024 at 08:17, Mark Andreev >> wrote: >> >>> Thank you for your review. >>> >>> Could you explain how to merge this commit into the upstream? I don't >>> want this PR to be abandoned. >>> >>> Best regards, >>> Mark Andreev >>> >>> >>> On Wed, 21 Aug 2024 at 23:08, Mich Talebzadeh >>> wrote: >>> >>>> Hi Mark, >>>> >>>> You have already done that and have made the request for review. >>>> >>>> +1 for me >>>> >>>> Mich Talebzadeh, >>>> >>>> Architect | Data Engineer | Data Science | Financial Crime >>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >>>> London, United Kingdom >>>> >>>> >>>>view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>> >>>> >>>> >>>> *Disclaimer:* The information provided is correct to the best of my >>>> knowledge but of course cannot be guaranteed . It is essential to note >>>> that, as with any advice, quote "one test result is worth one-thousand >>>> expert opinions (Werner >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>> >>>> >>>> On Wed, 21 Aug 2024 at 22:20, Mark Andreev >>>> wrote: >>>> >>>>> Thank you, Mich. >>>>> >>>>> What is the correct procedure to request a review? >>>>> >>>>> On Tue, 20 Aug 2024 at 22:57, Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> Hi Mark, >>>>>> >>>>>> Added a comment to Jira to provide more clarity to Description >>>>>> >>>>>> When encountering mixed schema rows, the current error message >>>>>> "{actual} is not a valid external type for schema of {expected}" lacks >>>>>> sufficient detail to identify the problematic column. This ambiguity >>>>>> hinders troubleshooting and increases development time. >>>>>> >>>>>> To enhance error clarity, we propose incorporating the source column >>>>>> name into the error message. For example: "Column 'my_column' has an >>>>>> actual >>>>>> type of {actual} which is not a valid external type for the expected >>>>>> schema >>>>>> of {expected}." >>>>>> >>>>>> By providing this additional context, developers can more efficiently >>>>>> pinpoint and resolve schema mismatches. >>>>>> >>>>>> >>>>>> HTH >>>>>> >>>>>> Mich Talebzadeh, >>>>>> >>>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>>>> College London >>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London> >>&g
Re: Please review (ValidateExternalType should return child in error)
Thank you for your review. Could you explain how to merge this commit into the upstream? I don't want this PR to be abandoned. Best regards, Mark Andreev On Wed, 21 Aug 2024 at 23:08, Mich Talebzadeh wrote: > Hi Mark, > > You have already done that and have made the request for review. > > +1 for me > > Mich Talebzadeh, > > Architect | Data Engineer | Data Science | Financial Crime > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College > London <https://en.wikipedia.org/wiki/Imperial_College_London> > London, United Kingdom > > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Wed, 21 Aug 2024 at 22:20, Mark Andreev wrote: > >> Thank you, Mich. >> >> What is the correct procedure to request a review? >> >> On Tue, 20 Aug 2024 at 22:57, Mich Talebzadeh >> wrote: >> >>> Hi Mark, >>> >>> Added a comment to Jira to provide more clarity to Description >>> >>> When encountering mixed schema rows, the current error message "{actual} >>> is not a valid external type for schema of {expected}" lacks sufficient >>> detail to identify the problematic column. This ambiguity hinders >>> troubleshooting and increases development time. >>> >>> To enhance error clarity, we propose incorporating the source column >>> name into the error message. For example: "Column 'my_column' has an actual >>> type of {actual} which is not a valid external type for the expected schema >>> of {expected}." >>> >>> By providing this additional context, developers can more efficiently >>> pinpoint and resolve schema mismatches. >>> >>> >>> HTH >>> >>> Mich Talebzadeh, >>> >>> Architect | Data Engineer | Data Science | Financial Crime >>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >>> London, United Kingdom >>> >>> >>>view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* The information provided is correct to the best of my >>> knowledge but of course cannot be guaranteed . It is essential to note >>> that, as with any advice, quote "one test result is worth one-thousand >>> expert opinions (Werner >>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>> >>> >>> On Tue, 20 Aug 2024 at 21:59, Mark Andreev >>> wrote: >>> >>>> Hi, >>>> >>>> Could you review my small PR [SPARK-49044][SQL] ValidateExternalType >>>> should return a child in error ( >>>> https://github.com/apache/spark/pull/47522 )? Changes contain tests >>>> that verify results. >>>> >>>> TLDR: After fix error message will contain extra information: [B is >>>> not a valid external type for schema of string at >>>> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, >>>> true]), 1, f3) >>>> If you need more information, please let me know. If you're busy, >>>> please let me know the best time to reach you again. >>>> >>>> On Mon, 29 Jul 2024 at 18:15, Mark Andreev >>>> wrote: >>>> >>>>> Hi Spark Devs, >>>>> >>>>> Please review my PR [ https://github.com/apache/spark/pull/47522 ] >>>>> that relates to ticket [ >>>>> https://issues.apache.org/jira/browse/SPARK-49044 ]. >>>>> >>>>> Context: When we have mixed schema rows, the error message "{actual} >>>>> is not a valid external type for schema of {expected}" doesn't help to >>>>> understand the column with the problem. I suggest adding information about >>>>> the source column. >>>>> >>>>> Example: >>>>> https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala >>>>> >>>>> Before fix: [B is not a valid external type for schema of string >>>>> After fix: [B is not a valid external type for schema of string at >>>>> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, >>>>> true]), 1, f3) >>>>> >>>>> -- >>>>> Best regards, >>>>> Mark Andreev >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Mark Andreev >>>> >>> >> >> -- >> Best regards, >> Mark Andreev >> >
Re: Please review (ValidateExternalType should return child in error)
Thank you, Mich. What is the correct procedure to request a review? On Tue, 20 Aug 2024 at 22:57, Mich Talebzadeh wrote: > Hi Mark, > > Added a comment to Jira to provide more clarity to Description > > When encountering mixed schema rows, the current error message "{actual} > is not a valid external type for schema of {expected}" lacks sufficient > detail to identify the problematic column. This ambiguity hinders > troubleshooting and increases development time. > > To enhance error clarity, we propose incorporating the source column name > into the error message. For example: "Column 'my_column' has an actual type > of {actual} which is not a valid external type for the expected schema of > {expected}." > > By providing this additional context, developers can more efficiently > pinpoint and resolve schema mismatches. > > > HTH > > Mich Talebzadeh, > > Architect | Data Engineer | Data Science | Financial Crime > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College > London <https://en.wikipedia.org/wiki/Imperial_College_London> > London, United Kingdom > > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Tue, 20 Aug 2024 at 21:59, Mark Andreev wrote: > >> Hi, >> >> Could you review my small PR [SPARK-49044][SQL] ValidateExternalType >> should return a child in error ( >> https://github.com/apache/spark/pull/47522 )? Changes contain tests >> that verify results. >> >> TLDR: After fix error message will contain extra information: [B is not >> a valid external type for schema of string at >> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, >> true]), 1, f3) >> If you need more information, please let me know. If you're busy, please >> let me know the best time to reach you again. >> >> On Mon, 29 Jul 2024 at 18:15, Mark Andreev >> wrote: >> >>> Hi Spark Devs, >>> >>> Please review my PR [ https://github.com/apache/spark/pull/47522 ] that >>> relates to ticket [ https://issues.apache.org/jira/browse/SPARK-49044 ]. >>> >>> Context: When we have mixed schema rows, the error message "{actual} is >>> not a valid external type for schema of {expected}" doesn't help to >>> understand the column with the problem. I suggest adding information about >>> the source column. >>> >>> Example: >>> https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala >>> >>> Before fix: [B is not a valid external type for schema of string >>> After fix: [B is not a valid external type for schema of string at >>> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, >>> true]), 1, f3) >>> >>> -- >>> Best regards, >>> Mark Andreev >>> >> >> >> -- >> Best regards, >> Mark Andreev >> > -- Best regards, Mark Andreev
Re: Please review (ValidateExternalType should return child in error)
Hi, Could you review my small PR [SPARK-49044][SQL] ValidateExternalType should return a child in error ( https://github.com/apache/spark/pull/47522 )? Changes contain tests that verify results. TLDR: After fix error message will contain extra information: [B is not a valid external type for schema of string at getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, f3) If you need more information, please let me know. If you're busy, please let me know the best time to reach you again. On Mon, 29 Jul 2024 at 18:15, Mark Andreev wrote: > Hi Spark Devs, > > Please review my PR [ https://github.com/apache/spark/pull/47522 ] that > relates to ticket [ https://issues.apache.org/jira/browse/SPARK-49044 ]. > > Context: When we have mixed schema rows, the error message "{actual} is > not a valid external type for schema of {expected}" doesn't help to > understand the column with the problem. I suggest adding information about > the source column. > > Example: > https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala > > Before fix: [B is not a valid external type for schema of string > After fix: [B is not a valid external type for schema of string at > getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, > true]), 1, f3) > > -- > Best regards, > Mark Andreev > -- Best regards, Mark Andreev
Please review (ValidateExternalType should return child in error)
Hi Spark Devs, Please review my PR [ https://github.com/apache/spark/pull/47522 ] that relates to ticket [ https://issues.apache.org/jira/browse/SPARK-49044 ]. Context: When we have mixed schema rows, the error message "{actual} is not a valid external type for schema of {expected}" doesn't help to understand the column with the problem. I suggest adding information about the source column. Example: https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala Before fix: [B is not a valid external type for schema of string After fix: [B is not a valid external type for schema of string at getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, f3) -- Best regards, Mark Andreev
[Suggest] Add geo function to core
Hi, I suggest adding geographical functions to Apache Core like Clickhouse ( https://clickhouse.com/docs/en/sql-reference/functions/geo/). - Geographical Coordinates Functions - Geohash Functions - H3 Indexes - S2 Indexes What do you think? What is current policy about core evolution? Should we create a separate module (standalone repository out of apache) and after success merge into the main branch? -- Best regards, Mark Andreev