[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17527 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110408142 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -910,7 +911,8 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat // When Hive rename partition for managed tables, it will create the partition location with // a default path generate by the new spec with lower cased partition column names. This is // unexpected and we need to rename them manually and alter the partition location. -val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => col.toLowerCase != col) +val hasUpperCasePartitionColumn = +partitionColumnNames.exists(col => col.toLowerCase != col) --- End diff -- @srowen, it seems mistakenly a newline inserted for this and the one below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110395985 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala --- @@ -26,11 +28,12 @@ package org.apache.spark.sql.catalyst.util class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Map[String, T] with Serializable { - val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase)) + val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase(Locale.ROOT))) --- End diff -- I don't think there's a right answer... for consistency it seems like we should treat keys uniformly and my gut says that we want those to be locale-neutral because they are generally never something localized or locale-specific. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110394983 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala --- @@ -26,11 +28,12 @@ package org.apache.spark.sql.catalyst.util class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Map[String, T] with Serializable { - val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase)) + val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase(Locale.ROOT))) --- End diff -- I am actually fine if you are fond of leaving it. I really think such cases are inappropriate and hardly possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110392436 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala --- @@ -26,11 +28,12 @@ package org.apache.spark.sql.catalyst.util class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Map[String, T] with Serializable { - val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase)) + val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase(Locale.ROOT))) --- End diff -- Really, it sounds nitpicking though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110392351 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala --- @@ -26,11 +28,12 @@ package org.apache.spark.sql.catalyst.util class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Map[String, T] with Serializable { - val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase)) + val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase(Locale.ROOT))) --- End diff -- Yes, hardly possible really.. not user but a developer could use that Turkish character for a external data source.. (external data source could use this up to my knowledge, see [interfaces.scala#L73-L74](https://github.com/apache/spark/blob/04ee8cf633e17b6bf95225a8dd77bf2e06980eb3/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L73-L74) and [DataSource.scala#L306](https://github.com/apache/spark/blob/096df6d933c5326e5782aa8c5de842a0800eb369/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L306)). For example, XML datasource is extending this - [DefaultSource.scala#L30](https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/DefaultSource.scala#L30). Though.. one case I finally come up is use the map as a kind of arbitrary key and value. Someone opened a PR to Spark XML `https://github.com/databricks/spark-xml/pull/247` and this supports to use arbitrary key and value for writing out attributes to the root element. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110382743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala --- @@ -26,11 +28,12 @@ package org.apache.spark.sql.catalyst.util class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Map[String, T] with Serializable { - val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase)) + val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase(Locale.ROOT))) --- End diff -- Yes, though would these keys ever involve a user-defined key string? I think they're all Spark, Hive properties, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110338963 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala --- @@ -82,8 +84,8 @@ case class OptimizeMetadataOnlyQuery( private def getPartitionAttrs( partitionColumnNames: Seq[String], relation: LogicalPlan): Seq[Attribute] = { -val partColumns = partitionColumnNames.map(_.toLowerCase).toSet -relation.output.filter(a => partColumns.contains(a.name.toLowerCase)) +val partColumns = partitionColumnNames.map(_.toLowerCase(Locale.ROOT)).toSet +relation.output.filter(a => partColumns.contains(a.name.toLowerCase(Locale.ROOT))) --- End diff -- Yes, the question is whether case-insensitive operations on schema elements, not just data, should be locale-neutral. The flip side to this argument is that a case-insensitive operation on a schema with the character "I" or "i" suddenly doesn't work as expected on a machine with the Turkish locale. I don't know which is rarer: schema with these special characters, or a machine with this particular Locale. I think I'd probably err on the side of not changing the behavior. I'll further revert the changes that touch things like column and table names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110317549 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -328,7 +329,7 @@ object PartitioningUtils { } else { // TODO: Selective case sensitivity. val distinctPartColNames = - pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase())).distinct + pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase(Locale.ROOT))).distinct --- End diff -- I think this might cause a similar problem with https://github.com/apache/spark/pull/17527/files#r110317272. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110298557 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala --- @@ -396,7 +397,7 @@ object PartitioningAwareFileIndex extends Logging { sessionOpt: Option[SparkSession]): Seq[FileStatus] = { logTrace(s"Listing $path") val fs = path.getFileSystem(hadoopConf) -val name = path.getName.toLowerCase +val name = path.getName.toLowerCase(Locale.ROOT) --- End diff -- (This variable seems not used.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110317695 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -222,7 +225,7 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi val columnNames = if (sparkSession.sessionState.conf.caseSensitiveAnalysis) { schema.map(_.name) } else { - schema.map(_.name.toLowerCase) + schema.map(_.name.toLowerCase(Locale.ROOT)) --- End diff -- Maybe, it is not good to point the similar instances all but let me just point this out as the change looks big. This maybe the similar instances with https://github.com/apache/spark/pull/17527/files#r110317272. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110317441 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -128,7 +128,8 @@ object PartitioningUtils { // "hdfs://host:9000/invalidPath" // "hdfs://host:9000/path" // TODO: Selective case sensitivity. - val discoveredBasePaths = optDiscoveredBasePaths.flatten.map(_.toString.toLowerCase()) + val discoveredBasePaths = + optDiscoveredBasePaths.flatten.map(_.toString.toLowerCase(Locale.ROOT)) --- End diff -- I am worried of this one too. It sounds the path could contains Turkish characters I guess.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110314669 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringKeyHashMap.scala --- @@ -25,7 +27,7 @@ object StringKeyHashMap { def apply[T](caseSensitive: Boolean): StringKeyHashMap[T] = if (caseSensitive) { new StringKeyHashMap[T](identity) } else { -new StringKeyHashMap[T](_.toLowerCase) +new StringKeyHashMap[T](_.toLowerCase(Locale.ROOT)) --- End diff -- This only seems used in `SimpleFunctionRegistry`. I don't think we have Turkish characters in function names and I don't think users will use other language in the function names. So probably it is fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110315394 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala --- @@ -82,8 +84,8 @@ case class OptimizeMetadataOnlyQuery( private def getPartitionAttrs( partitionColumnNames: Seq[String], relation: LogicalPlan): Seq[Attribute] = { -val partColumns = partitionColumnNames.map(_.toLowerCase).toSet -relation.output.filter(a => partColumns.contains(a.name.toLowerCase)) +val partColumns = partitionColumnNames.map(_.toLowerCase(Locale.ROOT)).toSet +relation.output.filter(a => partColumns.contains(a.name.toLowerCase(Locale.ROOT))) --- End diff -- I am little bit worried of this change likewise. For example, Before ```scala scala> java.util.Locale.setDefault(new java.util.Locale("tr")) scala> val partColumns = Seq("I").map(_.toLowerCase).toSet partColumns: scala.collection.immutable.Set[String] = Set(ı) scala> Seq("a", "ı", "I").filter(a => partColumns.contains(a.toLowerCase)) res13: Seq[String] = List(ı, I) ``` After ```scala scala> val partColumns = Seq("I").map(_.toLowerCase(java.util.Locale.ROOT)).toSet partColumns: scala.collection.immutable.Set[String] = Set(i) scala> Seq("a", "ı", "I").filter(a => partColumns.contains(a.toLowerCase(java.util.Locale.ROOT))) res14: Seq[String] = List(I) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110314541 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala --- @@ -26,11 +28,12 @@ package org.apache.spark.sql.catalyst.util class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Map[String, T] with Serializable { - val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase)) + val keyLowerCasedMap = originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase(Locale.ROOT))) --- End diff -- Maybe nitpicking and it is rarely possible I guess. However, up to my knowledge, this will affect the options users set, `spark.read.option(...)`. Namely, I think these case possible as below: ```scala scala> java.util.Locale.setDefault(new java.util.Locale("tr")) scala> val originalMap = Map("ı" -> 1, "I" -> 2) originalMap: scala.collection.immutable.Map[String,Int] = Map(ı -> 1, I -> 2) ``` Before ```scala scala> originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase)) res6: scala.collection.immutable.Map[String,Int] = Map(ı -> 2) ``` After ```scala scala> originalMap.map(kv => kv.copy(_1 = kv._1.toLowerCase(java.util.Locale.ROOT))) res7: scala.collection.immutable.Map[String,Int] = Map(ı -> 1, i -> 2) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110317272 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala --- @@ -52,7 +54,11 @@ case class HadoopFsRelation( val schema: StructType = { val getColName: (StructField => String) = - if (sparkSession.sessionState.conf.caseSensitiveAnalysis) _.name else _.name.toLowerCase + if (sparkSession.sessionState.conf.caseSensitiveAnalysis) { +_.name + } else { +_.name.toLowerCase(Locale.ROOT) + } --- End diff -- I think we should leave this out. It seems `dataSchema` means the schema from the source which is exposed to users. I think this could cause a problem. For example as below: Before ```scala import collection.mutable import org.apache.spark.sql.types._ java.util.Locale.setDefault(new java.util.Locale("tr")) val partitionSchema: StructType = StructType(StructField("I", StringType) :: Nil) val dataSchema: StructType = StructType(StructField("ı", StringType) :: Nil) val getColName: (StructField => String) = _.name.toLowerCase val overlappedPartCols = mutable.Map.empty[String, StructField] partitionSchema.foreach { partitionField => if (dataSchema.exists(getColName(_) == getColName(partitionField))) { overlappedPartCols += getColName(partitionField) -> partitionField } } val schema = StructType(dataSchema.map(f => overlappedPartCols.getOrElse(getColName(f), f)) ++ partitionSchema.filterNot(f => overlappedPartCols.contains(getColName(f schema.fieldNames ``` prints ```scala Array[String] = Array(I) ``` After ```scala import collection.mutable import org.apache.spark.sql.types._ java.util.Locale.setDefault(new java.util.Locale("tr")) val partitionSchema: StructType = StructType(StructField("I", StringType) :: Nil) val dataSchema: StructType = StructType(StructField("ı", StringType) :: Nil) val getColName: (StructField => String) = _.name.toLowerCase(java.util.Locale.ROOT) val overlappedPartCols = mutable.Map.empty[String, StructField] partitionSchema.foreach { partitionField => if (dataSchema.exists(getColName(_) == getColName(partitionField))) { overlappedPartCols += getColName(partitionField) -> partitionField } } val schema = StructType(dataSchema.map(f => overlappedPartCols.getOrElse(getColName(f), f)) ++ partitionSchema.filterNot(f => overlappedPartCols.contains(getColName(f schema.fieldNames ``` prints ```scala Array[String] = Array(ı, I) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17527#discussion_r110064613 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -407,7 +408,7 @@ public UTF8String toLowerCase() { } private UTF8String toLowerCaseSlow() { -return fromString(toString().toLowerCase()); +return fromString(toString().toLowerCase(Locale.ROOT)); --- End diff -- It seems there are few cases exposed to users. For example, this seems used in `Lower` and `InitCap` expressions, where, up to my knowledge, the lower cased ones are exposed into users. In this case, IIRC the same argument (consistent base + options for variants approach) is applied in [SPARK-18076](https://issues.apache.org/jira/browse/SPARK-18076). So, I think it might be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java S...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/17527 [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish locale bug" causes Spark problems ## What changes were proposed in this pull request? Add Locale.ROOT to all calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem"). The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-20156 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17527.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17527 commit 59194388d8de07d258a1a952b2cc52c57962c77a Author: Sean Owen Date: 2017-04-04T08:47:18Z Add Locale.ROOT to all calls to String toLowerCase, toUpperCase, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem") commit ab5995c18595195df51b43c6358ed761de0181a4 Author: Sean Owen Date: 2017-04-04T08:51:30Z Revert inadvertent change to R file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org