[ https://issues.apache.org/jira/browse/SPARK-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625150#comment-14625150 ]
Ilya Ganelin commented on SPARK-8907: ------------------------------------- [~rxin] The code for this in master has eliminated usage of zip and map as of [SPARK-8961|https://github.com/apache/spark/commit/33630883685eafcc3ee4521ea8363be342f6e6b4]. Do you think this can be further optimized and if so, how? There doesn't seem to be much within the existing catalyst expressions that would facilitate this, but I could be wrong. The relevant code fragment is below: {code} val partitionPath = { val partitionPathBuilder = new StringBuilder var i = 0 while (i < partitionColumns.length) { val col = partitionColumns(i) val partitionValueString = { val string = row.getString(i) if (string.eq(null)) defaultPartitionName else PartitioningUtils.escapePathName(string) } if (i > 0) { partitionPathBuilder.append(Path.SEPARATOR_CHAR) } partitionPathBuilder.append(s"$col=$partitionValueString") i += 1 } partitionPathBuilder.toString() } {code} > Speed up path construction in > DynamicPartitionWriterContainer.outputWriterForRow > -------------------------------------------------------------------------------- > > Key: SPARK-8907 > URL: https://issues.apache.org/jira/browse/SPARK-8907 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > > Don't use zip and scala collection methods to avoid garbage collection > {code} > val partitionPath = partitionColumns.zip(row.toSeq).map { case (col, > rawValue) => > val string = if (rawValue == null) null else String.valueOf(rawValue) > val valueString = if (string == null || string.isEmpty) { > defaultPartitionName > } else { > PartitioningUtils.escapePathName(string) > } > s"/$col=$valueString" > }.mkString.stripPrefix(Path.SEPARATOR) > {code} > We can probably use catalyst expressions themselves to construct the path, > and then we can leverage code generation to do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org