[ https://issues.apache.org/jira/browse/SPARK-45170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780627#comment-17780627 ]
PEIYUAN SUN commented on SPARK-45170: ------------------------------------- What is the difference between this and the [https://typelevel.org/frameless/FeatureOverview.html] ? > Scala-specific improvements in Dataset[T] API > ---------------------------------------------- > > Key: SPARK-45170 > URL: https://issues.apache.org/jira/browse/SPARK-45170 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.4.1 > Reporter: Danila Goloshchapov > Priority: Minor > Labels: SPIP > > *Q1.* What are you trying to do? > The main idea is to use the power of scala's macrosses to give developers > more convenient and typesafe API to use in join conditions. > > *Q2.* What problem is this proposal NOT designed to solve? > R/Java/Python/DataFrame API is out of scope. The solution is not affecting > plan generation too. > > *Q3.* How is it done today, and what are the limits of current practice? > Currently the join condition is specified via strings, which might lead to > silly mistakes (typos, incompatible column types etc) and sometimes hard to > read (in case when several joins are made and the final type is tuple of > tuple of tuples...) > > *Q4.* What is new in your approach and why do you think it will be successful? > Scala macroses can be used to extract the column name directly from lambda > (extractor). As a side effect its possible to check the column type and > prohibit to build inconsistent join expression (like boolean-timestamp > comparison) > > *Q5.* Who cares? If you are successful, what difference will it make? > Mainly scala developers who prefers typesafe code - they would have a more > clean and nice API that will make the codebase a bit clearer, especially in > case when several chained joins is used > > *Q6.* What are the risks? > The overusage of macrosses may slow down the compilation speed. In additional > macrosses are hard to maintain > > *Q7.* How long will it take? > Currently the approach is already implemented as a separate > [lib|https://github.com/Salamahin/joinwiz] that makes a bit more than just > gives alternative API (for example abstracts Dataset[T] to F[T] which allows > to run some spark-specific code without spark session for testing purposes) > Adaptation of it won't be a hard job, matter of several weeks > > *Q8.* What are the mid-term and final “exams” to check for success? > API convenience is very hard to estimate as its more or less a question of > taste > > *Appendix A* > You may find the examples of such 'cleaner' API > [here|https://github.com/Salamahin/joinwiz/blob/master/joinwiz_core/src/test/scala/joinwiz/ComputationEngineTest.scala] > Note that backward and forward compatibility is achieved by introducing a > brand-new API without modifying an old one > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org