Maciej Szymkiewicz created SPARK-30645: ------------------------------------------
Summary: collect() support Unicode charactes tests fails on Windows Key: SPARK-30645 URL: https://issues.apache.org/jira/browse/SPARK-30645 Project: Spark Issue Type: Bug Components: SparkR, Tests Affects Versions: 3.0.0 Reporter: Maciej Szymkiewicz As-is [test_that("collect() support Unicode characters"|https://github.com/apache/spark/blob/d5b92b24c41b047c64a4d89cc4061ebf534f0995/R/pkg/tests/fulltests/test_sparkSQL.R#L850-L869] case seems to be system dependent, and doesn't work properly on Windows with CP1252 English locale: {code:r} library(SparkR) SparkR::sparkR.session() Sys.info() # sysname release version # "Windows" "Server x64" "build 17763" # nodename machine login # "WIN-5BLT6Q610KH" "x86-64" "Administrator" # user effective_user # "Administrator" "Administrator" Sys.getlocale() # [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252" lines <- c("{\"name\":\"안녕하세요\"}", "{\"name\":\"您好\", \"age\":30}", "{\"name\":\"こんにちは\", \"age\":19}", "{\"name\":\"Xin chào\"}") system(paste0("cat ", jsonPath)) # {"name":"<U+C548><U+B155><U+D558><U+C138><U+C694>"} # {"name":"<U+60A8><U+597D>", "age":30} # {"name":"<U+3053><U+3093><U+306B><U+3061><U+306F>", "age":19} # {"name":"Xin chào"} # [1] 0 jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp") writeLines(lines, jsonPath) df <- read.df(jsonPath, "json") printSchema(df) # root # |-- _corrupt_record: string (nullable = true) # |-- age: long (nullable = true) # |-- name: string (nullable = true) head(df) # _corrupt_record age name # 1 <NA> NA <U+C548><U+B155><U+D558><U+C138><U+C694> # 2 <NA> 30 <U+60A8><U+597D> # 3 <NA> 19 <U+3053><U+3093><U+306B><U+3061><U+306F> # 4 {"name":"Xin ch<U+FFFD>o"} NA <NA> {code} Problem becomes visible on AppVoyer when testthat is updated to 2.x, but somehow silenced when testthat 1.x is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org