[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064972#comment-16064972 ]
Barry Becker commented on SPARK-20226: -------------------------------------- Calling cache() on the dataframe on the after the addColumn used to make this run fast. But around the time that we upgraded to spark 2.1.1 it got very slow again. Calling cache on the dataframe does not seem to help any more. If I hardcode the addColumn column expression to be {code} (((((((((((CAST(Plate AS STRING) + CAST(State AS STRING)) + CAST(License Type AS STRING)) + CAST(Violation Time AS STRING)) + CAST(Violation AS STRING)) + CAST(Judgment Entry Date AS STRING)) + CAST(Issue Date AS STRING)) + CAST(Summons Number AS STRING)) + CAST(Fine Amount AS STRING)) + CAST(Penalty Amount AS STRING)) + CAST(Interest Amount AS STRING)) + CAST(Violation AS STRING)) {code} instead of {code} CAST(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate, State), License Type), Violation Time), Violation), UDF(Judgment Entry Date)), UDF(Issue Date)), UDF(Summons Number)), UDF(Fine Amount)), UDF(Penalty Amount)), UDF(Interest Amount)), Violation) AS STRING) {code} which is what is generated by our expression parser, then the time goes from 70 seconds down to 10 seconds. Still slow, but not nearly as slow. > Call to sqlContext.cacheTable takes an incredibly long time in some cases > ------------------------------------------------------------------------- > > Key: SPARK-20226 > URL: https://issues.apache.org/jira/browse/SPARK-20226 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0 > Environment: linux or windows > Reporter: Barry Becker > Labels: cache > Attachments: profile_indexer2.PNG, xyzzy.csv > > > I have a case where the call to sqlContext.cacheTable can take an arbitrarily > long time depending on the number of columns that are referenced in a > withColumn expression applied to a dataframe. > The dataset is small (20 columns 7861 rows). The sequence to reproduce is the > following: > 1) add a new column that references 8 - 14 of the columns in the dataset. > - If I add 8 columns, then the call to cacheTable is fast - like *5 > seconds* > - If I add 11 columns, then it is slow - like *60 seconds* > - and if I add 14 columns, then it basically *takes forever* - I gave up > after 10 minutes or so. > The Column expression that is added, is basically just concatenating > the columns together in a single string. If a number is concatenated on a > string (or vice versa) the number is first converted to a string. > The expression looks something like this: > {code} > `Plate` + `State` + `License Type` + `Summons Number` + `Issue Date` + > `Violation Time` + `Violation` + `Judgment Entry Date` + `Fine Amount` + > `Penalty Amount` + `Interest Amount` > {code} > which we then convert to a Column expression that looks like this: > {code} > UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF('Plate, 'State), 'License Type), > UDF('Summons Number)), UDF('Issue Date)), 'Violation Time), 'Violation), > UDF('Judgment Entry Date)), UDF('Fine Amount)), UDF('Penalty Amount)), > UDF('Interest Amount)) > {code} > where the UDFs are very simple functions that basically call toString > and + as needed. > 2) apply a pipeline that includes some transformers that was saved earlier. > Here are the steps of the pipeline (extracted from parquet) > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200603,"sparkVersion":"2.1.0","uid":"strIdx_aeb04d2777cc","paramMap":{"handleInvalid":"skip","outputCol":"State_IDX__","inputCol":"State_CLEANED__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200837,"sparkVersion":"2.1.0","uid":"strIdx_0164c4c13979","paramMap":{"inputCol":"License > Type_CLEANED__","handleInvalid":"skip","outputCol":"License > Type_IDX__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201068,"sparkVersion":"2.1.0","uid":"strIdx_25b6cbd02751","paramMap":{"inputCol":"Violation_CLEANED__","handleInvalid":"skip","outputCol":"Violation_IDX__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201282,"sparkVersion":"2.1.0","uid":"strIdx_aa12df0354d9","paramMap":{"handleInvalid":"skip","inputCol":"County_CLEANED__","outputCol":"County_IDX__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201552,"sparkVersion":"2.1.0","uid":"strIdx_babb120f3cc1","paramMap":{"handleInvalid":"skip","outputCol":"Issuing > Agency_IDX__","inputCol":"Issuing Agency_CLEANED__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201759,"sparkVersion":"2.1.0","uid":"strIdx_5f2de9d9542d","paramMap":{"handleInvalid":"skip","outputCol":"Violation > Status_IDX__","inputCol":"Violation Status_CLEANED__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333201987,"sparkVersion":"2.1.0", > "uid":"bucketizer_6f65ca9fa813", > "paramMap":{ > "outputCol":"Summons > Number_BINNED__","handleInvalid":"keep","splits":["-Inf",1.386630656E9,3.696078592E9,4.005258752E9,6.045063168E9,8.136507392E9,"Inf"],"inputCol":"Summons > Number_CLEANED__" > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202079,"sparkVersion":"2.1.0", > "uid":"bucketizer_f5db4fb8120e", > "paramMap":{ > > "splits":["-Inf",1.435215616E9,1.443855616E9,1.447271936E9,1.448222464E9,1.448395264E9,1.448481536E9,1.448827136E9,1.449259264E9,1.449432064E9,1.449518336E9,"Inf"], > "handleInvalid":"keep","outputCol":"Issue > Date_BINNED__","inputCol":"Issue Date_CLEANED__" > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202172,"sparkVersion":"2.1.0", > "uid":"bucketizer_74568a2a5cfd", > "paramMap":{ > "handleInvalid":"keep","outputCol":"Fine > Amount_BINNED__","inputCol":"Fine > Amount_CLEANED__","splits":["-Inf",47.5,57.5,62.5,105.0,"Inf"] > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202269,"sparkVersion":"2.1.0", > "uid":"bucketizer_109705dfdbcd", > > "paramMap":{"splits":["-Inf",0.004999999888241291,"Inf"],"outputCol":"Interest > Amount_BINNED__","handleInvalid":"keep","inputCol":"Interest > Amount_CLEANED__"} > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202362,"sparkVersion":"2.1.0", > "uid":"bucketizer_2b2e3d8a324f", > "paramMap":{ > "handleInvalid":"keep","inputCol":"Reduction > Amount_CLEANED__","outputCol":"Reduction Amount_BINNED__", > "splits":["-Inf",5.994999885559082,24.0,41.0,57.5,120.0,"Inf"] > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202485,"sparkVersion":"2.1.0", > "uid":"bucketizer_4d44c2ebf489", > "paramMap":{ > > "splits":["-Inf",18.75,42.5,52.5,57.5,70.0050048828125,75.96499633789062,100.58499908447266,115.4949951171875,125.02000427246094,"Inf"],"handleInvalid":"keep", > "outputCol":"Payment Amount_BINNED__","inputCol":"Payment > Amount_CLEANED__" > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202587,"sparkVersion":"2.1.0", > "uid":"bucketizer_05a75eeef997", > "paramMap":{ > "handleInvalid":"keep", > > "splits":["-Inf",32.904998779296875,55.12000274658203,72.5,91.69999694824219,116.05500030517578,125.02999877929688,"Inf"], > "outputCol":"Amount Due_BINNED__","inputCol":"Amount Due_CLEANED__" > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202678,"sparkVersion":"2.1.0", > "uid":"bucketizer_64b3ef2f97cf", > > "paramMap":{"outputCol":"Precinct_BINNED__","handleInvalid":"keep","inputCol":"Precinct_CLEANED__","splits":["-Inf",0.5,23.5,"Inf"]} > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.VectorAssembler","timestamp":1491333202774,"sparkVersion":"2.1.0", > "uid":"vecAssembler_932758a8f18e", > "paramMap":{ > "outputCol":"_features_column__", > "inputCols":["State_IDX__","License > Type_IDX__","Violation_IDX__","County_IDX__","Issuing > Agency_IDX__","Violation Status_IDX__","Summons Number_BINNED__","Issue > Date_BINNED__","Fine Amount_BINNED__","Interest Amount_BINNED__","Reduction > Amount_BINNED__","Payment Amount_BINNED__","Amount > Due_BINNED__","Precinct_BINNED__"] > } > }{code} > - > {code}{"class":"org.apache.spark.ml.classification.NaiveBayesModel","timestamp":1491333202874,"sparkVersion":"2.1.0", > "uid":"nb_e4b24f3c08b0", > "paramMap":{ > "probabilityCol":"_class_probability_column__", > "labelCol":"Penalty Amount_BINNED__", > "predictionCol":"_prediction_column_", > "modelType":"multinomial", > "featuresCol":"_features_column__", > "rawPredictionCol":"rawPrediction", > "smoothing":3.518236190922951E-4 > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.SQLTransformer","timestamp":1491333203106,"sparkVersion":"2.1.0", > "uid":"sql_1ea4c1b5c52e", > "paramMap":{"statement":"SELECT *, CAST(_prediction_column_ AS INT) AS > `_*_prediction_label_column_*__` FROM __THIS__ /*cutInfo:[10.0,25.0]*/"} > }{code} > 3) Call cacheTable on sqlContext. The actual code used is: > {code} > val key = "foo" > if (sqlContext.tableNames.contains(key)) > sqlContext.dropTempTable(key) > df.createOrReplaceTempView(key) > sqlContext.cacheTable(key) <-- this takes a very long time > {code} > When I step through cacheTable in the debugger (in CacheManager.cacheQuery), > I see that the query "planToCache" is very large (see below). > I don't know much about query plans. Is this sort of giant nested query plan > expected in this case? Is it in any way typical? Does it explain why it takes > a very long time to cache? Why would adding just a few more columns to the > add column expression result in a plan that takes exponentially longer? > {code} > SubqueryAlias foo123, `foo123` > +- Project [Plate#123, State#124, License Type#125, Summons Number#126, Issue > Date#127, Violation Time#128, Violation#129, Judgment Entry Date#130, Fine > Amount#131, Penalty Amount#132, Interest Amount#133, Reduction Amount#134, > Payment Amount#135, Amount Due#136, Precinct#137, County#138, Issuing > Agency#139, Violation Status#140, columnBasedOnManyCols#141, Penalty Amount > (predicted)#2363] > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 33 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 33 more fields] > +- SubqueryAlias sql_1ea4c1b5c52e_5640c7097aca, > `sql_1ea4c1b5c52e_5640c7097aca` > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 32 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 31 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 30 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 29 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, > State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, > Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 28 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, > State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, > Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 27 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, > State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, > Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 26 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, > State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, > Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 25 more fields] > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 24 more fields] > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 23 more fields] > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 22 more fields] > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 21 more fields] > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 20 more fields] > +- Filter UDF(Violation > Status_CLEANED__#174) > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 19 more fields] > +- Filter > UDF(Issuing Agency_CLEANED__#173) > +- Project > [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License > Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons > Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, > Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, > Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, > Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 18 more fields] > +- Filter > UDF(County_CLEANED__#172) > +- Project > [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License > Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons > Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, > Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, > Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, > Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 17 more fields] > +- > Filter UDF(Violation_CLEANED__#167) > +- > Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, > License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, > Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation > Time#128, Violation Time_CLEANED__#166, Violation#129, > Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry > Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 16 more fields] > +- > Filter UDF(License Type_CLEANED__#164) > > +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, > License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, > Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation > Time#128, Violation Time_CLEANED__#166, Violation#129, > Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry > Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 15 more fields] > > +- Filter UDF(State_CLEANED__#163) > > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, CASE WHEN > isnull(Summons Number#126) THEN NaN ELSE Summons Number#126 END AS Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, CASE WHEN isnull(Interest > Amount#133) THEN NaN ELSE Interest Amount#133 END AS Interest > Amount_CLEANED__#250, Interest Amount#133, CASE WHEN isnull(Reduction > Amount#134) THEN NaN ELSE Reduction Amount#134 END AS Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 14 more fields] > > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number#126, Issue Date#127, CASE WHEN isnull(Issue Date_CLEANED__#165) THEN > NaN ELSE Issue Date_CLEANED__#165 END AS Issue Date_CLEANED__#210, Violation > Time#128, Violation Time_CLEANED__#166, Violation#129, > Violation_CLEANED__#167, Judgment Entry Date#130, CASE WHEN isnull(Judgment > Entry Date_CLEANED__#168) THEN NaN ELSE Judgment Entry Date_CLEANED__#168 END > AS Judgment Entry Date_CLEANED__#211, Fine Amount#131, CASE WHEN isnull(Fine > Amount_CLEANED__#169) THEN NaN ELSE Fine Amount_CLEANED__#169 END AS Fine > Amount_CLEANED__#212, Penalty Amount#132, CASE WHEN isnull(Penalty > Amount_CLEANED__#170) THEN NaN ELSE Penalty Amount_CLEANED__#170 END AS > Penalty Amount_CLEANED__#213, Interest Amount#133, Reduction Amount#134, > Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields] > > +- Project [Plate#123, UDF(Plate#123) AS Plate_CLEANED__#162, > State#124, UDF(State#124) AS State_CLEANED__#163, License Type#125, > UDF(License Type#125) AS License Type_CLEANED__#164, Summons Number#126, > Issue Date#127, cast(Issue Date#127 as double) AS Issue Date_CLEANED__#165, > Violation Time#128, UDF(Violation Time#128) AS Violation Time_CLEANED__#166, > Violation#129, UDF(Violation#129) AS Violation_CLEANED__#167, Judgment Entry > Date#130, cast(Judgment Entry Date#130 as double) AS Judgment Entry > Date_CLEANED__#168, Fine Amount#131, cast(Fine Amount#131 as double) AS Fine > Amount_CLEANED__#169, Penalty Amount#132, cast(Penalty Amount#132 as double) > AS Penalty Amount_CLEANED__#170, Interest Amount#133, Reduction Amount#134, > Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields] > > +- Project [Plate#6 AS Plate#123, State#7 AS State#124, > License Type#8 AS License Type#125, Summons Number#9 AS Summons Number#126, > Issue Date#10 AS Issue Date#127, Violation Time#11 AS Violation Time#128, > Violation#12 AS Violation#129, Judgment Entry Date#13 AS Judgment Entry > Date#130, Fine Amount#14 AS Fine Amount#131, Penalty Amount#15 AS Penalty > Amount#132, Interest Amount#16 AS Interest Amount#133, Reduction Amount#17 AS > Reduction Amount#134, Payment Amount#18 AS Payment Amount#135, Amount Due#19 > AS Amount Due#136, Precinct#20 AS Precinct#137, County#21 AS County#138, > Issuing Agency#22 AS Issuing Agency#139, Violation Status#23 AS Violation > Status#140, columnBasedOnManyCols#43 AS columnBasedOnManyCols#141] > > +- Project [Plate#6, State#7, License Type#8, Summons > Number#9, Issue Date#10, Violation Time#11, Violation#12, Judgment Entry > Date#13, Fine Amount#14, Penalty Amount#15, Interest Amount#16, Reduction > Amount#17, Payment Amount#18, Amount Due#19, Precinct#20, County#21, Issuing > Agency#22, Violation Status#23, > cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7), License > Type#8), UDF(Summons Number#9)), UDF(Issue Date#10)), Violation Time#11), > Violation#12), UDF(Judgment Entry Date#13)), UDF(Fine Amount#14)), > UDF(Penalty Amount#15)), UDF(Interest Amount#16)) as string) AS > columnBasedOnManyCols#43] > > +- Relation[Plate#6,State#7,License Type#8,Summons > Number#9,Issue Date#10,Violation Time#11,Violation#12,Judgment Entry > Date#13,Fine Amount#14,Penalty Amount#15,Interest Amount#16,Reduction > Amount#17,Payment Amount#18,Amount Due#19,Precinct#20,County#21,Issuing > Agency#22,Violation Status#23] csv > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org