[jira] [Commented] (HDFS-12571) Ozone: remove spaces from the beginning of the hdfs script
[ https://issues.apache.org/jira/browse/HDFS-12571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125635#comment-17125635 ] Jialin Liu commented on HDFS-12571: --- as of June 14 2020, I'm still seeing the same issue: {code:java} sudo bash start-all.sh Password: Starting namenodes on [localhost] /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: line 398: syntax error near unexpected token `<' /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: line 398: ` done < <(for text in "${input[@]}"; do' /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 70: hadoop_deprecate_envvar: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 87: hadoop_bootstrap: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 104: hadoop_parse_args: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 105: shift: : numeric argument required /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 249: hadoop_need_reexec: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 257: hadoop_verify_user_perm: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 218: hadoop_validate_classname: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 219: hadoop_exit_with_usage: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 268: hadoop_add_client_opts: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 275: hadoop_subcommand_opts: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 278: hadoop_generic_java_subcmd_handler: command not found Starting datanodes /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: line 398: syntax error near unexpected token `<' /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: line 398: ` done < <(for text in "${input[@]}"; do' /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 70: hadoop_deprecate_envvar: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 87: hadoop_bootstrap: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 104: hadoop_parse_args: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 105: shift: : numeric argument required /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 249: hadoop_need_reexec: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 257: hadoop_verify_user_perm: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 218: hadoop_validate_classname: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 219: hadoop_exit_with_usage: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 268: hadoop_add_client_opts: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 275: hadoop_subcommand_opts: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 278: hadoop_generic_java_subcmd_handler: command not found Starting secondary namenodes [Jialin.local] /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: line 398: syntax error near unexpected token `<' /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: line 398: ` done < <(for text in "${input[@]}"; do' /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 70: hadoop_deprecate_envvar: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 87: hadoop_bootstrap: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 104: hadoop_parse_args: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 105: shift: : numeric argument required /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 249: hadoop_need_reexec: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 257: hadoop_verify_user_perm: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 218: hadoop_validate_classname: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 219: hadoop_exit_with_usage: command not found /usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 268: hadoop_add_client_opts: command not found
[jira] [Commented] (SPARK-26261) Spark does not check completeness temporary file
[ https://issues.apache.org/jira/browse/SPARK-26261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710883#comment-16710883 ] Jialin LIu commented on SPARK-26261: Our initial test is: We start a word count workflow including persisting blocks to disk. After we make sure that there are some blocks on the disk already, we use the truncate command to truncate part of the block. We compare the result with the result produced by workflow without fault injection. > Spark does not check completeness temporary file > - > > Key: SPARK-26261 > URL: https://issues.apache.org/jira/browse/SPARK-26261 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: Jialin LIu >Priority: Minor > > Spark does not check temporary files' completeness. When persisting to disk > is enabled on some RDDs, a bunch of temporary files will be created on > blockmgr folder. Block manager is able to detect missing blocks while it is > not able detect file content being modified during execution. > Our initial test shows that if we truncate the block file before being used > by executors, the program will finish without detecting any error, but the > result content is totally wrong. > We believe there should be a file checksum on every RDD file block and these > files should be protected by checksum. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26261) Spark does not check completeness temporary file
Jialin LIu created SPARK-26261: -- Summary: Spark does not check completeness temporary file Key: SPARK-26261 URL: https://issues.apache.org/jira/browse/SPARK-26261 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.2 Reporter: Jialin LIu Spark does not check temporary files' completeness. When persisting to disk is enabled on some RDDs, a bunch of temporary files will be created on blockmgr folder. Block manager is able to detect missing blocks while it is not able detect file content being modified during execution. Our initial test shows that if we truncate the block file before being used by executors, the program will finish without detecting any error, but the result content is totally wrong. We believe there should be a file checksum on every RDD file block and these files should be protected by checksum. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26197) Spark master fails to detect driver process pause
Jialin LIu created SPARK-26197: -- Summary: Spark master fails to detect driver process pause Key: SPARK-26197 URL: https://issues.apache.org/jira/browse/SPARK-26197 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.2 Reporter: Jialin LIu I was using Spark 2.3.2 with standalone cluster and submit job using cluster mode. After I submit the job, I deliberately pause the driver process (throughout shell command "kill -stop (driver process id) ") to see if the master can detect this problem. The result shows that the driver will never stop. All the executors will try to talk back to driver and will give up in 10 minutes. Master can detect executor failures and try to reassign new executor process to redo the job. New executor will try to create RPC connection with driver and will fail in 2 minutes. Master will endlessly spawn new executors without detecting driver failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org