[jira] [Commented] (HDFS-12571) Ozone: remove spaces from the beginning of the hdfs script

2020-06-04 Thread Jialin Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125635#comment-17125635
 ] 

Jialin Liu commented on HDFS-12571:
---

as of June 14 2020, I'm still seeing the same issue:
{code:java}
sudo bash start-all.sh 
Password:
Starting namenodes on [localhost]
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: 
line 398: syntax error near unexpected token `<'
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: 
line 398: `  done < <(for text in "${input[@]}"; do'
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
70: hadoop_deprecate_envvar: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
87: hadoop_bootstrap: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
104: hadoop_parse_args: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
105: shift: : numeric argument required
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
249: hadoop_need_reexec: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
257: hadoop_verify_user_perm: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 218: 
hadoop_validate_classname: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 219: 
hadoop_exit_with_usage: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
268: hadoop_add_client_opts: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
275: hadoop_subcommand_opts: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
278: hadoop_generic_java_subcmd_handler: command not found
Starting datanodes
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: 
line 398: syntax error near unexpected token `<'
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: 
line 398: `  done < <(for text in "${input[@]}"; do'
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
70: hadoop_deprecate_envvar: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
87: hadoop_bootstrap: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
104: hadoop_parse_args: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
105: shift: : numeric argument required
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
249: hadoop_need_reexec: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
257: hadoop_verify_user_perm: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 218: 
hadoop_validate_classname: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 219: 
hadoop_exit_with_usage: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
268: hadoop_add_client_opts: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
275: hadoop_subcommand_opts: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
278: hadoop_generic_java_subcmd_handler: command not found
Starting secondary namenodes [Jialin.local]
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: 
line 398: syntax error near unexpected token `<'
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-functions.sh: 
line 398: `  done < <(for text in "${input[@]}"; do'
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
70: hadoop_deprecate_envvar: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
87: hadoop_bootstrap: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
104: hadoop_parse_args: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
105: shift: : numeric argument required
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
249: hadoop_need_reexec: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
257: hadoop_verify_user_perm: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 218: 
hadoop_validate_classname: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/hdfs: line 219: 
hadoop_exit_with_usage: command not found
/usr/local/Cellar/hadoop/3.2.1_1/libexec/bin/../libexec/hadoop-config.sh: line 
268: hadoop_add_client_opts: command not found

[jira] [Commented] (SPARK-26261) Spark does not check completeness temporary file

2018-12-05 Thread Jialin LIu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710883#comment-16710883
 ] 

Jialin LIu commented on SPARK-26261:


Our initial test is:

We start a word count workflow including persisting blocks to disk. After we 
make sure that there are some blocks on the disk already, we use the truncate 
command to truncate part of the block. We compare the result with the result 
produced by workflow without fault injection. 

> Spark does not check completeness temporary file 
> -
>
> Key: SPARK-26261
> URL: https://issues.apache.org/jira/browse/SPARK-26261
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: Jialin LIu
>Priority: Minor
>
> Spark does not check temporary files' completeness. When persisting to disk 
> is enabled on some RDDs, a bunch of temporary files will be created on 
> blockmgr folder. Block manager is able to detect missing blocks while it is 
> not able detect file content being modified during execution. 
> Our initial test shows that if we truncate the block file before being used 
> by executors, the program will finish without detecting any error, but the 
> result content is totally wrong.
> We believe there should be a file checksum on every RDD file block and these 
> files should be protected by checksum.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26261) Spark does not check completeness temporary file

2018-12-03 Thread Jialin LIu (JIRA)
Jialin LIu created SPARK-26261:
--

 Summary: Spark does not check completeness temporary file 
 Key: SPARK-26261
 URL: https://issues.apache.org/jira/browse/SPARK-26261
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.2
Reporter: Jialin LIu


Spark does not check temporary files' completeness. When persisting to disk is 
enabled on some RDDs, a bunch of temporary files will be created on blockmgr 
folder. Block manager is able to detect missing blocks while it is not able 
detect file content being modified during execution. 

Our initial test shows that if we truncate the block file before being used by 
executors, the program will finish without detecting any error, but the result 
content is totally wrong.

We believe there should be a file checksum on every RDD file block and these 
files should be protected by checksum.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26197) Spark master fails to detect driver process pause

2018-11-27 Thread Jialin LIu (JIRA)
Jialin LIu created SPARK-26197:
--

 Summary: Spark master fails to detect driver process pause
 Key: SPARK-26197
 URL: https://issues.apache.org/jira/browse/SPARK-26197
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.2
Reporter: Jialin LIu


I was using Spark 2.3.2 with standalone cluster and submit job using cluster 
mode. After I submit the job, I deliberately pause the driver process 
(throughout shell command "kill -stop (driver process id) ") to see if the 
master can detect this problem. The result shows that the driver will never 
stop. All the executors will try to talk back to driver and will give up in 10 
minutes. Master can detect executor failures and try to reassign new executor 
process to redo the job. New executor will try to create RPC connection with 
driver and will fail in 2 minutes. Master will endlessly spawn new executors 
without detecting driver failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org