[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960868#comment-15960868 ]
Barry Becker commented on SPARK-20226: -------------------------------------- Only 11 columns. I did not want to wait for 10 or 20 minutes on each run, so I only used 11. If I went to 14 it would take over 10 minutes (or longer). I guess I could try it again with 14 columns and see how much it helps. Maybe in that case it would make a bigger difference, but even waiting a minute for such a small dataset seems too long. > Call to sqlContext.cacheTable takes an incredibly long time in some cases > ------------------------------------------------------------------------- > > Key: SPARK-20226 > URL: https://issues.apache.org/jira/browse/SPARK-20226 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0 > Environment: linux or windows > Reporter: Barry Becker > Labels: cache > Attachments: profile_indexer2.PNG, xyzzy.csv > > > I have a case where the call to sqlContext.cacheTable can take an arbitrarily > long time depending on the number of columns that are referenced in a > withColumn expression applied to a dataframe. > The dataset is small (20 columns 7861 rows). The sequence to reproduce is the > following: > 1) add a new column that references 8 - 14 of the columns in the dataset. > - If I add 8 columns, then the call to cacheTable is fast - like *5 > seconds* > - If I add 11 columns, then it is slow - like *60 seconds* > - and if I add 14 columns, then it basically *takes forever* - I gave up > after 10 minutes or so. > The Column expression that is added, is basically just concatenating > the columns together in a single string. If a number is concatenated on a > string (or vice versa) the number is first converted to a string. > The expression looks something like this: > {code} > `Plate` + `State` + `License Type` + `Summons Number` + `Issue Date` + > `Violation Time` + `Violation` + `Judgment Entry Date` + `Fine Amount` + > `Penalty Amount` + `Interest Amount` > {code} > which we then convert to a Column expression that looks like this: > {code} > UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF('Plate, 'State), 'License Type), > UDF('Summons Number)), UDF('Issue Date)), 'Violation Time), 'Violation), > UDF('Judgment Entry Date)), UDF('Fine Amount)), UDF('Penalty Amount)), > UDF('Interest Amount)) > {code} > where the UDFs are very simple functions that basically call toString > and + as needed. > 2) apply a pipeline that includes some transformers that was saved earlier. > Here are the steps of the pipeline (extracted from parquet) > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200603,"sparkVersion":"2.1.0","uid":"strIdx_aeb04d2777cc","paramMap":{"handleInvalid":"skip","outputCol":"State_IDX__","inputCol":"State_CLEANED__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200837,"sparkVersion":"2.1.0","uid":"strIdx_0164c4c13979","paramMap":{"inputCol":"License > Type_CLEANED__","handleInvalid":"skip","outputCol":"License > Type_IDX__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201068,"sparkVersion":"2.1.0","uid":"strIdx_25b6cbd02751","paramMap":{"inputCol":"Violation_CLEANED__","handleInvalid":"skip","outputCol":"Violation_IDX__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201282,"sparkVersion":"2.1.0","uid":"strIdx_aa12df0354d9","paramMap":{"handleInvalid":"skip","inputCol":"County_CLEANED__","outputCol":"County_IDX__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201552,"sparkVersion":"2.1.0","uid":"strIdx_babb120f3cc1","paramMap":{"handleInvalid":"skip","outputCol":"Issuing > Agency_IDX__","inputCol":"Issuing Agency_CLEANED__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201759,"sparkVersion":"2.1.0","uid":"strIdx_5f2de9d9542d","paramMap":{"handleInvalid":"skip","outputCol":"Violation > Status_IDX__","inputCol":"Violation Status_CLEANED__"}}{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333201987,"sparkVersion":"2.1.0", > "uid":"bucketizer_6f65ca9fa813", > "paramMap":{ > "outputCol":"Summons > Number_BINNED__","handleInvalid":"keep","splits":["-Inf",1.386630656E9,3.696078592E9,4.005258752E9,6.045063168E9,8.136507392E9,"Inf"],"inputCol":"Summons > Number_CLEANED__" > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202079,"sparkVersion":"2.1.0", > "uid":"bucketizer_f5db4fb8120e", > "paramMap":{ > > "splits":["-Inf",1.435215616E9,1.443855616E9,1.447271936E9,1.448222464E9,1.448395264E9,1.448481536E9,1.448827136E9,1.449259264E9,1.449432064E9,1.449518336E9,"Inf"], > "handleInvalid":"keep","outputCol":"Issue > Date_BINNED__","inputCol":"Issue Date_CLEANED__" > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202172,"sparkVersion":"2.1.0", > "uid":"bucketizer_74568a2a5cfd", > "paramMap":{ > "handleInvalid":"keep","outputCol":"Fine > Amount_BINNED__","inputCol":"Fine > Amount_CLEANED__","splits":["-Inf",47.5,57.5,62.5,105.0,"Inf"] > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202269,"sparkVersion":"2.1.0", > "uid":"bucketizer_109705dfdbcd", > > "paramMap":{"splits":["-Inf",0.004999999888241291,"Inf"],"outputCol":"Interest > Amount_BINNED__","handleInvalid":"keep","inputCol":"Interest > Amount_CLEANED__"} > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202362,"sparkVersion":"2.1.0", > "uid":"bucketizer_2b2e3d8a324f", > "paramMap":{ > "handleInvalid":"keep","inputCol":"Reduction > Amount_CLEANED__","outputCol":"Reduction Amount_BINNED__", > "splits":["-Inf",5.994999885559082,24.0,41.0,57.5,120.0,"Inf"] > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202485,"sparkVersion":"2.1.0", > "uid":"bucketizer_4d44c2ebf489", > "paramMap":{ > > "splits":["-Inf",18.75,42.5,52.5,57.5,70.0050048828125,75.96499633789062,100.58499908447266,115.4949951171875,125.02000427246094,"Inf"],"handleInvalid":"keep", > "outputCol":"Payment Amount_BINNED__","inputCol":"Payment > Amount_CLEANED__" > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202587,"sparkVersion":"2.1.0", > "uid":"bucketizer_05a75eeef997", > "paramMap":{ > "handleInvalid":"keep", > > "splits":["-Inf",32.904998779296875,55.12000274658203,72.5,91.69999694824219,116.05500030517578,125.02999877929688,"Inf"], > "outputCol":"Amount Due_BINNED__","inputCol":"Amount Due_CLEANED__" > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202678,"sparkVersion":"2.1.0", > "uid":"bucketizer_64b3ef2f97cf", > > "paramMap":{"outputCol":"Precinct_BINNED__","handleInvalid":"keep","inputCol":"Precinct_CLEANED__","splits":["-Inf",0.5,23.5,"Inf"]} > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.VectorAssembler","timestamp":1491333202774,"sparkVersion":"2.1.0", > "uid":"vecAssembler_932758a8f18e", > "paramMap":{ > "outputCol":"_features_column__", > "inputCols":["State_IDX__","License > Type_IDX__","Violation_IDX__","County_IDX__","Issuing > Agency_IDX__","Violation Status_IDX__","Summons Number_BINNED__","Issue > Date_BINNED__","Fine Amount_BINNED__","Interest Amount_BINNED__","Reduction > Amount_BINNED__","Payment Amount_BINNED__","Amount > Due_BINNED__","Precinct_BINNED__"] > } > }{code} > - > {code}{"class":"org.apache.spark.ml.classification.NaiveBayesModel","timestamp":1491333202874,"sparkVersion":"2.1.0", > "uid":"nb_e4b24f3c08b0", > "paramMap":{ > "probabilityCol":"_class_probability_column__", > "labelCol":"Penalty Amount_BINNED__", > "predictionCol":"_prediction_column_", > "modelType":"multinomial", > "featuresCol":"_features_column__", > "rawPredictionCol":"rawPrediction", > "smoothing":3.518236190922951E-4 > } > }{code} > - > {code}{"class":"org.apache.spark.ml.feature.SQLTransformer","timestamp":1491333203106,"sparkVersion":"2.1.0", > "uid":"sql_1ea4c1b5c52e", > "paramMap":{"statement":"SELECT *, CAST(_prediction_column_ AS INT) AS > `_*_prediction_label_column_*__` FROM __THIS__ /*cutInfo:[10.0,25.0]*/"} > }{code} > 3) Call cacheTable on sqlContext. The actual code used is: > {code} > val key = "foo" > if (sqlContext.tableNames.contains(key)) > sqlContext.dropTempTable(key) > df.createOrReplaceTempView(key) > sqlContext.cacheTable(key) <-- this takes a very long time > {code} > When I step through cacheTable in the debugger (in CacheManager.cacheQuery), > I see that the query "planToCache" is very large (see below). > I don't know much about query plans. Is this sort of giant nested query plan > expected in this case? Is it in any way typical? Does it explain why it takes > a very long time to cache? Why would adding just a few more columns to the > add column expression result in a plan that takes exponentially longer? > {code} > SubqueryAlias foo123, `foo123` > +- Project [Plate#123, State#124, License Type#125, Summons Number#126, Issue > Date#127, Violation Time#128, Violation#129, Judgment Entry Date#130, Fine > Amount#131, Penalty Amount#132, Interest Amount#133, Reduction Amount#134, > Payment Amount#135, Amount Due#136, Precinct#137, County#138, Issuing > Agency#139, Violation Status#140, columnBasedOnManyCols#141, Penalty Amount > (predicted)#2363] > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 33 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 33 more fields] > +- SubqueryAlias sql_1ea4c1b5c52e_5640c7097aca, > `sql_1ea4c1b5c52e_5640c7097aca` > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 32 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 31 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 30 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 29 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, > State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, > Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 28 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, > State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, > Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 27 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, > State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, > Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 26 more fields] > +- Project [Plate#123, Plate_CLEANED__#162, > State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, > Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 25 more fields] > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 24 more fields] > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 23 more fields] > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 22 more fields] > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 21 more fields] > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 20 more fields] > +- Filter UDF(Violation > Status_CLEANED__#174) > +- Project [Plate#123, > Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, > License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, > Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation > Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry > Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine > Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 19 more fields] > +- Filter > UDF(Issuing Agency_CLEANED__#173) > +- Project > [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License > Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons > Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, > Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, > Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, > Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 18 more fields] > +- Filter > UDF(County_CLEANED__#172) > +- Project > [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License > Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons > Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, > Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, > Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, > Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, > Interest Amount_CLEANED__#250, Interest Amount#133, Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 17 more fields] > +- > Filter UDF(Violation_CLEANED__#167) > +- > Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, > License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, > Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation > Time#128, Violation Time_CLEANED__#166, Violation#129, > Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry > Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 16 more fields] > +- > Filter UDF(License Type_CLEANED__#164) > > +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, > License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, > Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation > Time#128, Violation Time_CLEANED__#166, Violation#129, > Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry > Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, > Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, > ... 15 more fields] > > +- Filter UDF(State_CLEANED__#163) > > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, CASE WHEN > isnull(Summons Number#126) THEN NaN ELSE Summons Number#126 END AS Summons > Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue > Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, > Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment > Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty > Amount#132, Penalty Amount_CLEANED__#213, CASE WHEN isnull(Interest > Amount#133) THEN NaN ELSE Interest Amount#133 END AS Interest > Amount_CLEANED__#250, Interest Amount#133, CASE WHEN isnull(Reduction > Amount#134) THEN NaN ELSE Reduction Amount#134 END AS Reduction > Amount_CLEANED__#251, Reduction Amount#134, ... 14 more fields] > > +- Project [Plate#123, Plate_CLEANED__#162, State#124, > State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons > Number#126, Issue Date#127, CASE WHEN isnull(Issue Date_CLEANED__#165) THEN > NaN ELSE Issue Date_CLEANED__#165 END AS Issue Date_CLEANED__#210, Violation > Time#128, Violation Time_CLEANED__#166, Violation#129, > Violation_CLEANED__#167, Judgment Entry Date#130, CASE WHEN isnull(Judgment > Entry Date_CLEANED__#168) THEN NaN ELSE Judgment Entry Date_CLEANED__#168 END > AS Judgment Entry Date_CLEANED__#211, Fine Amount#131, CASE WHEN isnull(Fine > Amount_CLEANED__#169) THEN NaN ELSE Fine Amount_CLEANED__#169 END AS Fine > Amount_CLEANED__#212, Penalty Amount#132, CASE WHEN isnull(Penalty > Amount_CLEANED__#170) THEN NaN ELSE Penalty Amount_CLEANED__#170 END AS > Penalty Amount_CLEANED__#213, Interest Amount#133, Reduction Amount#134, > Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields] > > +- Project [Plate#123, UDF(Plate#123) AS Plate_CLEANED__#162, > State#124, UDF(State#124) AS State_CLEANED__#163, License Type#125, > UDF(License Type#125) AS License Type_CLEANED__#164, Summons Number#126, > Issue Date#127, cast(Issue Date#127 as double) AS Issue Date_CLEANED__#165, > Violation Time#128, UDF(Violation Time#128) AS Violation Time_CLEANED__#166, > Violation#129, UDF(Violation#129) AS Violation_CLEANED__#167, Judgment Entry > Date#130, cast(Judgment Entry Date#130 as double) AS Judgment Entry > Date_CLEANED__#168, Fine Amount#131, cast(Fine Amount#131 as double) AS Fine > Amount_CLEANED__#169, Penalty Amount#132, cast(Penalty Amount#132 as double) > AS Penalty Amount_CLEANED__#170, Interest Amount#133, Reduction Amount#134, > Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields] > > +- Project [Plate#6 AS Plate#123, State#7 AS State#124, > License Type#8 AS License Type#125, Summons Number#9 AS Summons Number#126, > Issue Date#10 AS Issue Date#127, Violation Time#11 AS Violation Time#128, > Violation#12 AS Violation#129, Judgment Entry Date#13 AS Judgment Entry > Date#130, Fine Amount#14 AS Fine Amount#131, Penalty Amount#15 AS Penalty > Amount#132, Interest Amount#16 AS Interest Amount#133, Reduction Amount#17 AS > Reduction Amount#134, Payment Amount#18 AS Payment Amount#135, Amount Due#19 > AS Amount Due#136, Precinct#20 AS Precinct#137, County#21 AS County#138, > Issuing Agency#22 AS Issuing Agency#139, Violation Status#23 AS Violation > Status#140, columnBasedOnManyCols#43 AS columnBasedOnManyCols#141] > > +- Project [Plate#6, State#7, License Type#8, Summons > Number#9, Issue Date#10, Violation Time#11, Violation#12, Judgment Entry > Date#13, Fine Amount#14, Penalty Amount#15, Interest Amount#16, Reduction > Amount#17, Payment Amount#18, Amount Due#19, Precinct#20, County#21, Issuing > Agency#22, Violation Status#23, > cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7), License > Type#8), UDF(Summons Number#9)), UDF(Issue Date#10)), Violation Time#11), > Violation#12), UDF(Judgment Entry Date#13)), UDF(Fine Amount#14)), > UDF(Penalty Amount#15)), UDF(Interest Amount#16)) as string) AS > columnBasedOnManyCols#43] > > +- Relation[Plate#6,State#7,License Type#8,Summons > Number#9,Issue Date#10,Violation Time#11,Violation#12,Judgment Entry > Date#13,Fine Amount#14,Penalty Amount#15,Interest Amount#16,Reduction > Amount#17,Payment Amount#18,Amount Due#19,Precinct#20,County#21,Issuing > Agency#22,Violation Status#23] csv > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org