[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

gatorsmile Mon, 22 Jan 2018 20:37:25 -0800

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20087#discussion_r163141749
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ---
    @@ -55,18 +55,28 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
           customPartitionLocations: Map[TablePartitionSpec, String] = 
Map.empty,
           partitionAttributes: Seq[Attribute] = Nil): Set[String] = {
     
    -    val isCompressed = hadoopConf.get("hive.exec.compress.output", 
"false").toBoolean
    +    val isCompressed =
    +      
fileSinkConf.getTableInfo.getOutputFileFormatClassName.toLowerCase(Locale.ROOT) 
match {
    +        case formatName if formatName.endsWith("orcoutputformat") =>
    +          // For ORC,"mapreduce.output.fileoutputformat.compress",
    +          // "mapreduce.output.fileoutputformat.compress.codec", and
    +          // "mapreduce.output.fileoutputformat.compress.type"
    +          // have no impact because it uses table properties to store 
compression information.
    --- End diff --
    
    The comment might not be correct now. We need to follow what the latest 
Hive works, if possible. The best way to try Hive (and the other RDBMS) is 
using docker. Maybe you can try the docker?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

Reply via email to