[ 
https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960835#comment-15960835
 ] 

Liang-Chi Hsieh commented on SPARK-20226:
-----------------------------------------

How many columns are added in above runs? I didn't see the long running time (> 
10mins at least) as you reported in the jira description.

For big query plans, constraint propagation will hit combination explosion 
issue and block the driver for long. So we have this flag 
"spark.sql.constraintPropagation.enabled" to disable it.

For relatively small query plans (I suppose the above runs are because of the 
shorter running time), this flag doesn't make significant difference.

Every time when you cache the table after adding a column, it finishes planning 
the query plan, so you will not hit the issue of constraint propagation.

> Call to sqlContext.cacheTable takes an incredibly long time in some cases
> -------------------------------------------------------------------------
>
>                 Key: SPARK-20226
>                 URL: https://issues.apache.org/jira/browse/SPARK-20226
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>         Environment: linux or windows
>            Reporter: Barry Becker
>              Labels: cache
>         Attachments: profile_indexer2.PNG, xyzzy.csv
>
>
> I have a case where the call to sqlContext.cacheTable can take an arbitrarily 
> long time depending on the number of columns that are referenced in a 
> withColumn expression applied to a dataframe.
> The dataset is small (20 columns 7861 rows). The sequence to reproduce is the 
> following:
> 1) add a new column that references 8 - 14 of the columns in the dataset. 
>    - If I add 8 columns, then the call to cacheTable is fast - like *5 
> seconds*
>    - If I add 11 columns, then it is slow - like *60 seconds*
>    - and if I add 14 columns, then it basically *takes forever* - I gave up 
> after 10 minutes or so.
>       The Column expression that is added, is basically just concatenating 
> the columns together in a single string. If a number is concatenated on a 
> string (or vice versa) the number is first converted to a string.
>       The expression looks something like this:
> {code}
> `Plate` + `State` + `License Type` + `Summons Number` + `Issue Date` + 
> `Violation Time` + `Violation` + `Judgment Entry Date` + `Fine Amount` + 
> `Penalty Amount` + `Interest Amount`
> {code}
>         which we then convert to a Column expression that looks like this:
> {code}
> UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF('Plate, 'State), 'License Type), 
> UDF('Summons Number)), UDF('Issue Date)), 'Violation Time), 'Violation), 
> UDF('Judgment Entry Date)), UDF('Fine Amount)), UDF('Penalty Amount)), 
> UDF('Interest Amount))
> {code}
>        where the UDFs are very simple functions that basically call toString 
> and + as needed.
> 2) apply a pipeline that includes some transformers that was saved earlier. 
> Here are the steps of the pipeline (extracted from parquet)
>  - 
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200603,"sparkVersion":"2.1.0","uid":"strIdx_aeb04d2777cc","paramMap":{"handleInvalid":"skip","outputCol":"State_IDX__","inputCol":"State_CLEANED__"}}{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200837,"sparkVersion":"2.1.0","uid":"strIdx_0164c4c13979","paramMap":{"inputCol":"License
>  Type_CLEANED__","handleInvalid":"skip","outputCol":"License 
> Type_IDX__"}}{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201068,"sparkVersion":"2.1.0","uid":"strIdx_25b6cbd02751","paramMap":{"inputCol":"Violation_CLEANED__","handleInvalid":"skip","outputCol":"Violation_IDX__"}}{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201282,"sparkVersion":"2.1.0","uid":"strIdx_aa12df0354d9","paramMap":{"handleInvalid":"skip","inputCol":"County_CLEANED__","outputCol":"County_IDX__"}}{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201552,"sparkVersion":"2.1.0","uid":"strIdx_babb120f3cc1","paramMap":{"handleInvalid":"skip","outputCol":"Issuing
>  Agency_IDX__","inputCol":"Issuing Agency_CLEANED__"}}{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201759,"sparkVersion":"2.1.0","uid":"strIdx_5f2de9d9542d","paramMap":{"handleInvalid":"skip","outputCol":"Violation
>  Status_IDX__","inputCol":"Violation Status_CLEANED__"}}{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333201987,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_6f65ca9fa813",
>       "paramMap":{
>         "outputCol":"Summons 
> Number_BINNED__","handleInvalid":"keep","splits":["-Inf",1.386630656E9,3.696078592E9,4.005258752E9,6.045063168E9,8.136507392E9,"Inf"],"inputCol":"Summons
>  Number_CLEANED__"
>        }
>    }{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202079,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_f5db4fb8120e",
>     "paramMap":{
>          
> "splits":["-Inf",1.435215616E9,1.443855616E9,1.447271936E9,1.448222464E9,1.448395264E9,1.448481536E9,1.448827136E9,1.449259264E9,1.449432064E9,1.449518336E9,"Inf"],
>           "handleInvalid":"keep","outputCol":"Issue 
> Date_BINNED__","inputCol":"Issue Date_CLEANED__"
>        }
>    }{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202172,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_74568a2a5cfd",
>       "paramMap":{
>         "handleInvalid":"keep","outputCol":"Fine 
> Amount_BINNED__","inputCol":"Fine 
> Amount_CLEANED__","splits":["-Inf",47.5,57.5,62.5,105.0,"Inf"]
>        }
>       }{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202269,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_109705dfdbcd",
>       
> "paramMap":{"splits":["-Inf",0.004999999888241291,"Inf"],"outputCol":"Interest
>  Amount_BINNED__","handleInvalid":"keep","inputCol":"Interest 
> Amount_CLEANED__"}
>    }{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202362,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_2b2e3d8a324f",
>       "paramMap":{
>          "handleInvalid":"keep","inputCol":"Reduction 
> Amount_CLEANED__","outputCol":"Reduction Amount_BINNED__",
>          "splits":["-Inf",5.994999885559082,24.0,41.0,57.5,120.0,"Inf"]
>        }
>    }{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202485,"sparkVersion":"2.1.0",
>      "uid":"bucketizer_4d44c2ebf489",
>      "paramMap":{
>        
> "splits":["-Inf",18.75,42.5,52.5,57.5,70.0050048828125,75.96499633789062,100.58499908447266,115.4949951171875,125.02000427246094,"Inf"],"handleInvalid":"keep",
>          "outputCol":"Payment Amount_BINNED__","inputCol":"Payment 
> Amount_CLEANED__"
>        }
>    }{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202587,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_05a75eeef997",
>       "paramMap":{
>          "handleInvalid":"keep",
>          
> "splits":["-Inf",32.904998779296875,55.12000274658203,72.5,91.69999694824219,116.05500030517578,125.02999877929688,"Inf"],
>          "outputCol":"Amount Due_BINNED__","inputCol":"Amount Due_CLEANED__"
>        }
>    }{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202678,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_64b3ef2f97cf",
>       
> "paramMap":{"outputCol":"Precinct_BINNED__","handleInvalid":"keep","inputCol":"Precinct_CLEANED__","splits":["-Inf",0.5,23.5,"Inf"]}
>    }{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.VectorAssembler","timestamp":1491333202774,"sparkVersion":"2.1.0",
>     "uid":"vecAssembler_932758a8f18e",
>       "paramMap":{
>         "outputCol":"_features_column__",
>         "inputCols":["State_IDX__","License 
> Type_IDX__","Violation_IDX__","County_IDX__","Issuing 
> Agency_IDX__","Violation Status_IDX__","Summons Number_BINNED__","Issue 
> Date_BINNED__","Fine Amount_BINNED__","Interest Amount_BINNED__","Reduction 
> Amount_BINNED__","Payment Amount_BINNED__","Amount 
> Due_BINNED__","Precinct_BINNED__"]
>       }
>    }{code}
>  - 
> {code}{"class":"org.apache.spark.ml.classification.NaiveBayesModel","timestamp":1491333202874,"sparkVersion":"2.1.0",
>     "uid":"nb_e4b24f3c08b0",
>       "paramMap":{
>         "probabilityCol":"_class_probability_column__",
>         "labelCol":"Penalty Amount_BINNED__",
>         "predictionCol":"_prediction_column_",
>         "modelType":"multinomial",
>         "featuresCol":"_features_column__",
>         "rawPredictionCol":"rawPrediction",
>         "smoothing":3.518236190922951E-4
>        }
>    }{code}
>  - 
> {code}{"class":"org.apache.spark.ml.feature.SQLTransformer","timestamp":1491333203106,"sparkVersion":"2.1.0",
>     "uid":"sql_1ea4c1b5c52e",
>       "paramMap":{"statement":"SELECT *, CAST(_prediction_column_ AS INT) AS 
> `_*_prediction_label_column_*__` FROM __THIS__ /*cutInfo:[10.0,25.0]*/"}
>    }{code}
>    3) Call cacheTable on sqlContext. The actual code used is:
>    {code}
>     val key = "foo"
>     if (sqlContext.tableNames.contains(key))
>       sqlContext.dropTempTable(key)
>     df.createOrReplaceTempView(key)
>     sqlContext.cacheTable(key)        <-- this takes a very long time
> {code}
> When I step through cacheTable in the debugger (in CacheManager.cacheQuery), 
> I see that the query "planToCache" is very large (see below). 
> I don't know much about query plans. Is this sort of giant nested query plan 
> expected in this case? Is it in any way typical? Does it explain why it takes 
> a very long time to cache? Why would adding just a few more columns to the 
> add column expression result in a plan that takes exponentially longer?
> {code}
> SubqueryAlias foo123, `foo123`
> +- Project [Plate#123, State#124, License Type#125, Summons Number#126, Issue 
> Date#127, Violation Time#128, Violation#129, Judgment Entry Date#130, Fine 
> Amount#131, Penalty Amount#132, Interest Amount#133, Reduction Amount#134, 
> Payment Amount#135, Amount Due#136, Precinct#137, County#138, Issuing 
> Agency#139, Violation Status#140, columnBasedOnManyCols#141, Penalty Amount 
> (predicted)#2363]
>    +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 33 more fields]
>       +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 33 more fields]
>          +- SubqueryAlias sql_1ea4c1b5c52e_5640c7097aca, 
> `sql_1ea4c1b5c52e_5640c7097aca`
>             +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 32 more fields]
>                +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 31 more fields]
>                   +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 30 more fields]
>                      +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 29 more fields]
>                         +- Project [Plate#123, Plate_CLEANED__#162, 
> State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, 
> Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 28 more fields]
>                            +- Project [Plate#123, Plate_CLEANED__#162, 
> State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, 
> Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 27 more fields]
>                               +- Project [Plate#123, Plate_CLEANED__#162, 
> State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, 
> Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 26 more fields]
>                                  +- Project [Plate#123, Plate_CLEANED__#162, 
> State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, 
> Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 25 more fields]
>                                     +- Project [Plate#123, 
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, 
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, 
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
> Amount_CLEANED__#251, Reduction Amount#134, ... 24 more fields]
>                                        +- Project [Plate#123, 
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, 
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, 
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
> Amount_CLEANED__#251, Reduction Amount#134, ... 23 more fields]
>                                           +- Project [Plate#123, 
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, 
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, 
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
> Amount_CLEANED__#251, Reduction Amount#134, ... 22 more fields]
>                                              +- Project [Plate#123, 
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, 
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, 
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
> Amount_CLEANED__#251, Reduction Amount#134, ... 21 more fields]
>                                                 +- Project [Plate#123, 
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, 
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, 
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
> Amount_CLEANED__#251, Reduction Amount#134, ... 20 more fields]
>                                                    +- Filter UDF(Violation 
> Status_CLEANED__#174)
>                                                       +- Project [Plate#123, 
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, 
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, 
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
> Amount_CLEANED__#251, Reduction Amount#134, ... 19 more fields]
>                                                          +- Filter 
> UDF(Issuing Agency_CLEANED__#173)
>                                                             +- Project 
> [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License 
> Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons 
> Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, 
> Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, 
> Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, 
> Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
> Amount_CLEANED__#251, Reduction Amount#134, ... 18 more fields]
>                                                                +- Filter 
> UDF(County_CLEANED__#172)
>                                                                   +- Project 
> [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License 
> Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons 
> Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, 
> Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, 
> Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, 
> Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
> Amount_CLEANED__#251, Reduction Amount#134, ... 17 more fields]
>                                                                      +- 
> Filter UDF(Violation_CLEANED__#167)
>                                                                         +- 
> Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, 
> License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, 
> Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation 
> Time#128, Violation Time_CLEANED__#166, Violation#129, 
> Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
> Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 16 more fields]
>                                                                            +- 
> Filter UDF(License Type_CLEANED__#164)
>                                                                               
> +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, 
> License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, 
> Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation 
> Time#128, Violation Time_CLEANED__#166, Violation#129, 
> Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
> Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, 
> ... 15 more fields]
>                                                                               
>    +- Filter UDF(State_CLEANED__#163)
>                                                                               
>       +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, CASE WHEN 
> isnull(Summons Number#126) THEN NaN ELSE Summons Number#126 END AS Summons 
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment 
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
> Amount#132, Penalty Amount_CLEANED__#213, CASE WHEN isnull(Interest 
> Amount#133) THEN NaN ELSE Interest Amount#133 END AS Interest 
> Amount_CLEANED__#250, Interest Amount#133, CASE WHEN isnull(Reduction 
> Amount#134) THEN NaN ELSE Reduction Amount#134 END AS Reduction 
> Amount_CLEANED__#251, Reduction Amount#134, ... 14 more fields]
>                                                                               
>          +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
> Number#126, Issue Date#127, CASE WHEN isnull(Issue Date_CLEANED__#165) THEN 
> NaN ELSE Issue Date_CLEANED__#165 END AS Issue Date_CLEANED__#210, Violation 
> Time#128, Violation Time_CLEANED__#166, Violation#129, 
> Violation_CLEANED__#167, Judgment Entry Date#130, CASE WHEN isnull(Judgment 
> Entry Date_CLEANED__#168) THEN NaN ELSE Judgment Entry Date_CLEANED__#168 END 
> AS Judgment Entry Date_CLEANED__#211, Fine Amount#131, CASE WHEN isnull(Fine 
> Amount_CLEANED__#169) THEN NaN ELSE Fine Amount_CLEANED__#169 END AS Fine 
> Amount_CLEANED__#212, Penalty Amount#132, CASE WHEN isnull(Penalty 
> Amount_CLEANED__#170) THEN NaN ELSE Penalty Amount_CLEANED__#170 END AS 
> Penalty Amount_CLEANED__#213, Interest Amount#133, Reduction Amount#134, 
> Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields]
>                                                                               
>             +- Project [Plate#123, UDF(Plate#123) AS Plate_CLEANED__#162, 
> State#124, UDF(State#124) AS State_CLEANED__#163, License Type#125, 
> UDF(License Type#125) AS License Type_CLEANED__#164, Summons Number#126, 
> Issue Date#127, cast(Issue Date#127 as double) AS Issue Date_CLEANED__#165, 
> Violation Time#128, UDF(Violation Time#128) AS Violation Time_CLEANED__#166, 
> Violation#129, UDF(Violation#129) AS Violation_CLEANED__#167, Judgment Entry 
> Date#130, cast(Judgment Entry Date#130 as double) AS Judgment Entry 
> Date_CLEANED__#168, Fine Amount#131, cast(Fine Amount#131 as double) AS Fine 
> Amount_CLEANED__#169, Penalty Amount#132, cast(Penalty Amount#132 as double) 
> AS Penalty Amount_CLEANED__#170, Interest Amount#133, Reduction Amount#134, 
> Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields]
>                                                                               
>                +- Project [Plate#6 AS Plate#123, State#7 AS State#124, 
> License Type#8 AS License Type#125, Summons Number#9 AS Summons Number#126, 
> Issue Date#10 AS Issue Date#127, Violation Time#11 AS Violation Time#128, 
> Violation#12 AS Violation#129, Judgment Entry Date#13 AS Judgment Entry 
> Date#130, Fine Amount#14 AS Fine Amount#131, Penalty Amount#15 AS Penalty 
> Amount#132, Interest Amount#16 AS Interest Amount#133, Reduction Amount#17 AS 
> Reduction Amount#134, Payment Amount#18 AS Payment Amount#135, Amount Due#19 
> AS Amount Due#136, Precinct#20 AS Precinct#137, County#21 AS County#138, 
> Issuing Agency#22 AS Issuing Agency#139, Violation Status#23 AS Violation 
> Status#140, columnBasedOnManyCols#43 AS columnBasedOnManyCols#141]
>                                                                               
>                   +- Project [Plate#6, State#7, License Type#8, Summons 
> Number#9, Issue Date#10, Violation Time#11, Violation#12, Judgment Entry 
> Date#13, Fine Amount#14, Penalty Amount#15, Interest Amount#16, Reduction 
> Amount#17, Payment Amount#18, Amount Due#19, Precinct#20, County#21, Issuing 
> Agency#22, Violation Status#23, 
> cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7), License 
> Type#8), UDF(Summons Number#9)), UDF(Issue Date#10)), Violation Time#11), 
> Violation#12), UDF(Judgment Entry Date#13)), UDF(Fine Amount#14)), 
> UDF(Penalty Amount#15)), UDF(Interest Amount#16)) as string) AS 
> columnBasedOnManyCols#43]
>                                                                               
>                      +- Relation[Plate#6,State#7,License Type#8,Summons 
> Number#9,Issue Date#10,Violation Time#11,Violation#12,Judgment Entry 
> Date#13,Fine Amount#14,Penalty Amount#15,Interest Amount#16,Reduction 
> Amount#17,Payment Amount#18,Amount Due#19,Precinct#20,County#21,Issuing 
> Agency#22,Violation Status#23] csv
> {code}        



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to