[jira] [Comment Edited] (SPARK-39959) Recover SparkR CRAN check in GitHub Actions CI

Yikun Jiang (Jira) Wed, 03 Aug 2022 10:00:43 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-39959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574864#comment-17574864
 ]


Yikun Jiang edited comment on SPARK-39959 at 8/3/22 4:59 PM:
-------------------------------------------------------------

1. Looks like docker cache not working (do full refresh then)

   - [https://www.diffchecker.com/TpOlQsg1] from results, it might related to 
change on github action docker config change?

  - need to do a Full refresh on dockerfile to make cache work.

 

2. roxygen2 upgrade to 7.2.1, this should be the root reason of sparkr job 
failed.


was (Author: yikunkero):
1. Looks like docker cache not working (do full refresh then)

   - [https://www.diffchecker.com/TpOlQsg1] from results, it might related to 
change on github action docker config change?

  - need to do a Full refresh on dockerfile to make cache work.

 

2. roxygen2 upgrade to 7.2.1, this should be the root reason.

> Recover SparkR CRAN check in GitHub Actions CI
> ----------------------------------------------
>
>                 Key: SPARK-39959
>                 URL: https://issues.apache.org/jira/browse/SPARK-39959
>             Project: Spark
>          Issue Type: Test
>          Components: Project Infra, SparkR
>    Affects Versions: 3.4.0
>            Reporter: Hyukjin Kwon
>            Priority: Major
>
> {code}
> cd R
> ./install-dev.sh
>  ./check-cran.sh
> {code}
> fails (I think with latest dependences of R documentation build, e.g., 
> rmarkdown or roxygen2) as below in the current CI 
> (https://github.com/apache/spark/runs/7623722912?check_suite_focus=true)
> {code}
> * checking for missing documentation entries ... WARNING
> Undocumented code objects:
>   ‘%<=>%’ ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’
>   ‘approx_count_distinct’ ‘arrange’ ‘array_aggregate’ ‘array_contains’
>   ‘array_distinct’ ‘array_except’ ‘array_exists’ ‘array_filter’
>   ‘array_forall’ ‘array_intersect’ ‘array_join’ ‘array_max’ ‘array_min’
>   ‘array_position’ ‘array_remove’ ‘array_repeat’ ‘array_sort’
>   ‘array_to_vector’ ‘array_transform’ ‘array_union’ ‘arrays_overlap’
>   ‘arrays_zip’ ‘arrays_zip_with’ ‘as.DataFrame’ ‘as.data.frame’ ‘asc’
>   ‘asc_nulls_first’ ‘asc_nulls_last’ ‘ascii’ ‘assert_true’ ‘avg’
>   ‘awaitTermination’ ‘base64’ ‘between’ ‘bin’ ‘bit_length’ ‘bitwiseNOT’
>   ‘bitwise_not’ ‘broadcast’ ‘bround’ ‘cache’ ‘cacheTable’
>   ‘cancelJobGroup’ ‘cast’ ‘cbrt’ ‘ceil’ ‘checkpoint’ ‘clearCache’
>   ‘clearJobGroup’ ‘coalesce’ ‘collect’ ‘collect_list’ ‘collect_set’
>   ‘colnames’ ‘colnames<-’ ‘coltypes’ ‘coltypes<-’ ‘column’ ‘columns’
>   ‘concat’ ‘concat_ws’ ‘contains’ ‘conv’ ‘corr’ ‘cot’ ‘count’
>   ‘countDistinct’ ‘count_distinct’ ‘cov’ ‘covar_pop’ ‘covar_samp’
>   ‘crc32’ ‘createDataFrame’ ‘createExternalTable’
>   ‘createOrReplaceTempView’ ‘createTable’ ‘create_array’ ‘create_map’
>   ‘crossJoin’ ‘crosstab’ ‘csc’ ‘cube’ ‘cume_dist’ ‘currentCatalog’
>   ‘currentDatabase’ ‘current_date’ ‘current_timestamp’ ‘dapply’
>   ‘dapplyCollect’ ‘databaseExists’ ‘date_add’ ‘date_format’ ‘date_sub’
>   ‘date_trunc’ ‘datediff’ ‘dayofmonth’ ‘dayofweek’ ‘dayofyear’ ‘decode’
>   ‘degrees’ ‘dense_rank’ ‘desc’ ‘desc_nulls_first’ ‘desc_nulls_last’
>   ‘describe’ ‘distinct’ ‘drop’ ‘dropDuplicates’ ‘dropFields’
>   ‘dropTempTable’ ‘dropTempView’ ‘dropna’ ‘dtypes’ ‘element_at’
>   ‘encode’ ‘endsWith’ ‘except’ ‘exceptAll’ ‘explain’ ‘explode’
>   ‘explode_outer’ ‘expr’ ‘fillna’ ‘filter’ ‘first’ ‘flatten’
>   ‘format_number’ ‘format_string’ ‘freqItems’ ‘from_avro’ ‘from_csv’
>   ‘from_json’ ‘from_unixtime’ ‘from_utc_timestamp’ ‘functionExists’
>   ‘gapply’ ‘gapplyCollect’ ‘getDatabase’ ‘getField’ ‘getFunc’ ‘getItem’
>   ‘getLocalProperty’ ‘getNumPartitions’ ‘getTable’ ‘greatest’ ‘groupBy’
>   ‘group_by’ ‘grouping_bit’ ‘grouping_id’ ‘hash’ ‘hex’ ‘hint’
>   ‘histogram’ ‘hour’ ‘hypot’ ‘ilike’ ‘initcap’ ‘input_file_name’
>   ‘insertInto’ ‘install.spark’ ‘instr’ ‘intersect’ ‘intersectAll’
>   ‘isActive’ ‘isLocal’ ‘isNaN’ ‘isNotNull’ ‘isNull’ ‘isStreaming’
>   ‘isnan’ ‘join’ ‘kurtosis’ ‘lag’ ‘last’ ‘lastProgress’ ‘last_day’
>   ‘lead’ ‘least’ ‘levenshtein’ ‘like’ ‘limit’ ‘listCatalogs’
>   ‘listColumns’ ‘listDatabases’ ‘listFunctions’ ‘listTables’ ‘lit’
>   ‘loadDF’ ‘localCheckpoint’ ‘locate’ ‘lower’ ‘lpad’ ‘ltrim’
>   ‘make_date’ ‘map_concat’ ‘map_entries’ ‘map_filter’ ‘map_from_arrays’
>   ‘map_from_entries’ ‘map_keys’ ‘map_values’ ‘map_zip_with’ ‘max_by’
>   ‘md5’ ‘min_by’ ‘minute’ ‘monotonically_increasing_id’ ‘month’
>   ‘months_between’ ‘mutate’ ‘n’ ‘n_distinct’ ‘na.omit’ ‘nanvl’ ‘negate’
>   ‘next_day’ ‘not’ ‘nth_value’ ‘ntile’ ‘octet_length’ ‘orderBy’
>   ‘otherwise’ ‘over’ ‘overlay’ ‘partitionBy’ ‘percent_rank’
>   ‘percentile_approx’ ‘persist’ ‘pivot’ ‘pmod’ ‘posexplode’
>   ‘posexplode_outer’ ‘predict’ ‘print.jobj’ ‘print.structField’
>   ‘print.structType’ ‘print.summary.DecisionTreeClassificationModel’
>   ‘print.summary.DecisionTreeRegressionModel’
>   ‘print.summary.GBTClassificationModel’
>   ‘print.summary.GBTRegressionModel’
>   ‘print.summary.GeneralizedLinearRegressionModel’
>   ‘print.summary.KSTest’
>   ‘print.summary.RandomForestClassificationModel’
>   ‘print.summary.RandomForestRegressionModel’ ‘printSchema’ ‘product’
>   ‘quarter’ ‘queryName’ ‘radians’ ‘raise_error’ ‘rand’ ‘randn’
>   ‘randomSplit’ ‘rangeBetween’ ‘rank’ ‘rbind’ ‘read.df’ ‘read.jdbc’
>   ‘read.json’ ‘read.ml’ ‘read.orc’ ‘read.parquet’ ‘read.stream’
>   ‘read.text’ ‘recoverPartitions’ ‘refreshByPath’ ‘refreshTable’
>   ‘regexp_extract’ ‘regexp_replace’ ‘registerTempTable’ ‘rename’
>   ‘repartition’ ‘repartitionByRange’ ‘repeat_string’ ‘reverse’ ‘rint’
>   ‘rlike’ ‘rollup’ ‘row_number’ ‘rowsBetween’ ‘rpad’ ‘rtrim’ ‘sample’
>   ‘sampleBy’ ‘sample_frac’ ‘saveAsTable’ ‘saveDF’ ‘schema’
>   ‘schema_of_csv’ ‘schema_of_json’ ‘sd’ ‘sec’ ‘second’ ‘select’
>   ‘selectExpr’ ‘setCheckpointDir’ ‘setCurrentCatalog’
>   ‘setCurrentDatabase’ ‘setJobDescription’ ‘setJobGroup’
>   ‘setLocalProperty’ ‘setLogLevel’ ‘sha1’ ‘sha2’ ‘shiftLeft’
>   ‘shiftRight’ ‘shiftRightUnsigned’ ‘shiftleft’ ‘shiftright’
>   ‘shiftrightunsigned’ ‘showDF’ ‘shuffle’ ‘signum’ ‘size’ ‘skewness’
>   ‘slice’ ‘sort_array’ ‘soundex’ ‘spark.addFile’ ‘spark.als’
>   ‘spark.assignClusters’ ‘spark.associationRules’
>   ‘spark.bisectingKmeans’ ‘spark.decisionTree’
>   ‘spark.findFrequentSequentialPatterns’ ‘spark.fmClassifier’
>   ‘spark.fmRegressor’ ‘spark.fpGrowth’ ‘spark.freqItemsets’
>   ‘spark.gaussianMixture’ ‘spark.gbt’ ‘spark.getSparkFiles’
>   ‘spark.getSparkFilesRootDirectory’ ‘spark.glm’ ‘spark.isoreg’
>   ‘spark.kmeans’ ‘spark.kstest’ ‘spark.lapply’ ‘spark.lda’ ‘spark.lm’
>   ‘spark.logit’ ‘spark.mlp’ ‘spark.naiveBayes’ ‘spark.perplexity’
>   ‘spark.posterior’ ‘spark.randomForest’ ‘spark.survreg’
>   ‘spark.svmLinear’ ‘sparkR.callJMethod’ ‘sparkR.callJStatic’
>   ‘sparkR.conf’ ‘sparkR.init’ ‘sparkR.newJObject’ ‘sparkR.session’
>   ‘sparkR.session.stop’ ‘sparkR.stop’ ‘sparkR.uiWebUrl’
>   ‘sparkR.version’ ‘sparkRHive.init’ ‘sparkRSQL.init’
>   ‘spark_partition_id’ ‘split_string’ ‘sql’ ‘startsWith’ ‘status’
>   ‘stddev’ ‘stddev_pop’ ‘stddev_samp’ ‘stopQuery’ ‘storageLevel’
>   ‘struct’ ‘structField’ ‘structField.character’ ‘structField.jobj’
>   ‘structType’ ‘structType.character’ ‘structType.jobj’
>   ‘structType.structField’ ‘subset’ ‘substring_index’ ‘sumDistinct’
>   ‘sum_distinct’ ‘summarize’ ‘summary’ ‘tableExists’ ‘tableNames’
>   ‘tableToDF’ ‘tables’ ‘take’ ‘timestamp_seconds’ ‘toDegrees’ ‘toJSON’
>   ‘toRadians’ ‘to_avro’ ‘to_csv’ ‘to_date’ ‘to_json’ ‘to_timestamp’
>   ‘to_utc_timestamp’ ‘transform’ ‘transform_keys’ ‘transform_values’
>   ‘translate’ ‘trim’ ‘unbase64’ ‘uncacheTable’ ‘unhex’ ‘union’
>   ‘unionAll’ ‘unionByName’ ‘unix_timestamp’ ‘unpersist’ ‘upper’ ‘var’
>   ‘var_pop’ ‘var_samp’ ‘variance’ ‘vector_to_array’ ‘weekofyear’ ‘when’
>   ‘where’ ‘window’ ‘windowOrderBy’ ‘windowPartitionBy’ ‘withColumn’
>   ‘withColumnRenamed’ ‘withField’ ‘withWatermark’ ‘write.df’
>   ‘write.jdbc’ ‘write.json’ ‘write.ml’ ‘write.orc’ ‘write.parquet’
>   ‘write.stream’ ‘write.text’ ‘xxhash64’ ‘year’
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-39959) Recover SparkR CRAN check in GitHub Actions CI

Reply via email to