Re: Local spark interpreter with extra java options

2021-07-11 Thread Lior Chaga
So after adding the quotes in both SparkInterpreterLauncher
and interpreter.sh, interpreter is still failing with same error of
Unrecognized option.
But the weird thing is that if I copy the command supposedly executed from
zeppelin (as it is printed to log) and run it directly in shell, the
interpreter process is properly running. So my guess is that the forked
process command that is created, is not really identical to the one that is
logged.

This is how my cmd looks like (censored a bit):

/usr/local/spark/bin/spark-submit
--class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
--driver-class-path
:/zeppelin/local-repo/spark/*:/zeppelin/interpreter/spark/*:::/zeppelin/inter
preter/zeppelin-interpreter-shaded-0.10.0-SNAPSHOT.jar:/zeppelin/interpreter/spark/spark-interpreter-0.10.0-SNAPSHOT.jar:/etc/hadoop/conf

*--driver-java-options " -DSERVICENAME=zeppelin_docker
-Dfile.encoding=UTF-8
-Dlog4j.configuration=file:///zeppelin/conf/log4j.properties
-Dlog4j.configurationFile=file:///zeppelin/conf/log4j2.properties
-Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-shared_process--zeppelin-test-spark3-7d74d5df4-2g8x5.log"
*
--conf spark.driver.host=10.135.120.245
--conf "spark.dynamicAllocation.minExecutors=1"
--conf "spark.shuffle.service.enabled=true"
--conf "spark.sql.parquet.int96AsTimestamp=true"
--conf "spark.ui.retainedTasks=1"
--conf "spark.executor.heartbeatInterval=600s"
--conf "spark.ui.retainedJobs=100"
--conf "spark.sql.ui.retainedExecutions=10"
--conf "spark.hadoop.cloneConf=true"
--conf "spark.debug.maxToStringFields=20"
--conf "spark.executor.memory=70g"
--conf
"spark.executor.extraClassPath=../mysql-connector-java-8.0.18.jar:../guava-19.0.jar"

--conf "spark.hadoop.fs.permissions.umask-mode=000"
--conf "spark.memory.storageFraction=0.1"
--conf "spark.scheduler.mode=FAIR"
--conf "spark.sql.adaptive.enabled=true"
--conf
"spark.master=mesos://zk://zk003:2181,zk004:2181,zk006:2181,/mesos-zeppelin"

--conf "spark.driver.memory=15g"
--conf "spark.io.compression.codec=lz4"
--conf "spark.executor.uri=
https://artifactory.company.com/artifactory/static/spark/spark-dist/spark-3.1.2.2-hadoop-2.7-zulu";
-
-conf "spark.ui.retainedStages=500"
--conf "spark.mesos.uris=
https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/mysql-connector-java-8.0.18.jar,https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/guava-19.0.jar";

--conf "spark.driver.maxResultSize=8g"
*--conf "spark.executor.extraJavaOptions=-DSERVICENAME=Zeppelin
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2015
-XX:-OmitStackTraceInFastThrow -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=55745
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -verbose:gc
-Dlog4j.configurationFile=/etc/config/log4j2-executor-config.xml
-XX:+UseG1GC -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:+PrintFlagsFinal -XX:+PrintReferenceGC
-XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy
-XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark
-XX:+PrintStringDeduplicationStatistics -XX:+UseStringDeduplication
-XX:InitiatingHeapOccupancyPercent=35
-Dhttps.proxyHost=proxy.service.consul -Dhttps.proxyPort=3128" *
--conf "spark.dynamicAllocation.enabled=true"
--conf "spark.default.parallelism=1200"
--conf "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2"
--conf
"spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS"

--conf "spark.app.name=zeppelin_docker_spark3"
--conf "spark.shuffle.service.port=7337"
--conf "spark.memory.fraction=0.75"
--conf "spark.mesos.coarse=true"
--conf "spark.ui.port=4041"
--conf "spark.dynamicAllocation.executorIdleTimeout=60s"
--conf "spark.sql.shuffle.partitions=1200"
--conf "spark.sql.parquet.outputTimestampType=TIMESTAMP_MILLIS"
--conf "spark.dynamicAllocation.cachedExecutorIdleTimeout=120s"
--conf "spark.network.timeout=1200s"
--conf "spark.cores.max=600"
--conf
"spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"

--conf "spark.worker.timeout=15"
*--conf
"spark.driver.extraJavaOptions=-Dhttps.proxyHost=proxy.service.consul
-Dhttps.proxyPort=3128
-Dlog4j.configuration=file:/usr/local/spark/conf/log4j.properties
-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
-Djavax.jdo.option.ConnectionPassword=2eebb22277
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://proxysql-backend.service.consul.company.com:6033/hms?useSSL=false&databaseTerm=SCHEMA&nullDatabaseMeansCurrent=true

-Djavax.jdo.option.ConnectionUserName=hms_rw" *
--conf "spark.files.overwrite=true"
/zeppelin/interpreter/spark/spark-interpreter-0.10.0-SNAPSHOT.jar
10.135.120.245
36419
spark-shared_process :



*Error: Unrecognized option:
-agentlib:jdwp=transport=dt_socket,server=y

Re: Local spark interpreter with extra java options

2021-07-11 Thread Jeff Zhang
I believe this is due to SparkInterpreterLauncher doesn't support
parameters with whitespace. (It would use whitespace as delimiter to
separate parameters), this is a known issue

Lior Chaga  于2021年7月11日周日 下午4:14写道:

> So after adding the quotes in both SparkInterpreterLauncher
> and interpreter.sh, interpreter is still failing with same error of
> Unrecognized option.
> But the weird thing is that if I copy the command supposedly executed from
> zeppelin (as it is printed to log) and run it directly in shell, the
> interpreter process is properly running. So my guess is that the forked
> process command that is created, is not really identical to the one that is
> logged.
>
> This is how my cmd looks like (censored a bit):
>
> /usr/local/spark/bin/spark-submit
> --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
> --driver-class-path
> :/zeppelin/local-repo/spark/*:/zeppelin/interpreter/spark/*:::/zeppelin/inter
> preter/zeppelin-interpreter-shaded-0.10.0-SNAPSHOT.jar:/zeppelin/interpreter/spark/spark-interpreter-0.10.0-SNAPSHOT.jar:/etc/hadoop/conf
>
> *--driver-java-options " -DSERVICENAME=zeppelin_docker
> -Dfile.encoding=UTF-8
> -Dlog4j.configuration=file:///zeppelin/conf/log4j.properties
> -Dlog4j.configurationFile=file:///zeppelin/conf/log4j2.properties
> -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-shared_process--zeppelin-test-spark3-7d74d5df4-2g8x5.log"
> *
> --conf spark.driver.host=10.135.120.245
> --conf "spark.dynamicAllocation.minExecutors=1"
> --conf "spark.shuffle.service.enabled=true"
> --conf "spark.sql.parquet.int96AsTimestamp=true"
> --conf "spark.ui.retainedTasks=1"
> --conf "spark.executor.heartbeatInterval=600s"
> --conf "spark.ui.retainedJobs=100"
> --conf "spark.sql.ui.retainedExecutions=10"
> --conf "spark.hadoop.cloneConf=true"
> --conf "spark.debug.maxToStringFields=20"
> --conf "spark.executor.memory=70g"
> --conf
> "spark.executor.extraClassPath=../mysql-connector-java-8.0.18.jar:../guava-19.0.jar"
>
> --conf "spark.hadoop.fs.permissions.umask-mode=000"
> --conf "spark.memory.storageFraction=0.1"
> --conf "spark.scheduler.mode=FAIR"
> --conf "spark.sql.adaptive.enabled=true"
> --conf
> "spark.master=mesos://zk://zk003:2181,zk004:2181,zk006:2181,/mesos-zeppelin"
>
> --conf "spark.driver.memory=15g"
> --conf "spark.io.compression.codec=lz4"
> --conf "spark.executor.uri=
> https://artifactory.company.com/artifactory/static/spark/spark-dist/spark-3.1.2.2-hadoop-2.7-zulu";
> -
> -conf "spark.ui.retainedStages=500"
> --conf "spark.mesos.uris=
> https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/mysql-connector-java-8.0.18.jar,https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/guava-19.0.jar";
>
> --conf "spark.driver.maxResultSize=8g"
> *--conf "spark.executor.extraJavaOptions=-DSERVICENAME=Zeppelin
> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2015
> -XX:-OmitStackTraceInFastThrow -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.port=55745
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false -verbose:gc
> -Dlog4j.configurationFile=/etc/config/log4j2-executor-config.xml
> -XX:+UseG1GC -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps -XX:+PrintFlagsFinal -XX:+PrintReferenceGC
> -XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy
> -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark
> -XX:+PrintStringDeduplicationStatistics -XX:+UseStringDeduplication
> -XX:InitiatingHeapOccupancyPercent=35
> -Dhttps.proxyHost=proxy.service.consul -Dhttps.proxyPort=3128" *
> --conf "spark.dynamicAllocation.enabled=true"
> --conf "spark.default.parallelism=1200"
> --conf "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2"
> --conf
> "spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS"
>
> --conf "spark.app.name=zeppelin_docker_spark3"
> --conf "spark.shuffle.service.port=7337"
> --conf "spark.memory.fraction=0.75"
> --conf "spark.mesos.coarse=true"
> --conf "spark.ui.port=4041"
> --conf "spark.dynamicAllocation.executorIdleTimeout=60s"
> --conf "spark.sql.shuffle.partitions=1200"
> --conf "spark.sql.parquet.outputTimestampType=TIMESTAMP_MILLIS"
> --conf "spark.dynamicAllocation.cachedExecutorIdleTimeout=120s"
> --conf "spark.network.timeout=1200s"
> --conf "spark.cores.max=600"
> --conf
> "spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"
>
> --conf "spark.worker.timeout=15"
> *--conf
> "spark.driver.extraJavaOptions=-Dhttps.proxyHost=proxy.service.consul
> -Dhttps.proxyPort=3128
> -Dlog4j.configuration=file:/usr/local/spark/conf/log4j.properties
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionPassword=2eebb22277
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://proxysql-backend.service.consul.company.com:6033/hms?useSSL=false&databaseTerm=SCHEMA&nullDatabaseMeansCurren

Re: Local spark interpreter with extra java options

2021-07-11 Thread Lior Chaga
Thanks Jeff,
So I should escape the whitespaces? Is there a ticket for it? couldn't find
one

On Sun, Jul 11, 2021 at 1:10 PM Jeff Zhang  wrote:

> I believe this is due to SparkInterpreterLauncher doesn't support
> parameters with whitespace. (It would use whitespace as delimiter to
> separate parameters), this is a known issue
>
> Lior Chaga  于2021年7月11日周日 下午4:14写道:
>
>> So after adding the quotes in both SparkInterpreterLauncher
>> and interpreter.sh, interpreter is still failing with same error of
>> Unrecognized option.
>> But the weird thing is that if I copy the command supposedly executed
>> from zeppelin (as it is printed to log) and run it directly in shell, the
>> interpreter process is properly running. So my guess is that the forked
>> process command that is created, is not really identical to the one that is
>> logged.
>>
>> This is how my cmd looks like (censored a bit):
>>
>> /usr/local/spark/bin/spark-submit
>> --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
>> --driver-class-path
>> :/zeppelin/local-repo/spark/*:/zeppelin/interpreter/spark/*:::/zeppelin/inter
>> preter/zeppelin-interpreter-shaded-0.10.0-SNAPSHOT.jar:/zeppelin/interpreter/spark/spark-interpreter-0.10.0-SNAPSHOT.jar:/etc/hadoop/conf
>>
>> *--driver-java-options " -DSERVICENAME=zeppelin_docker
>> -Dfile.encoding=UTF-8
>> -Dlog4j.configuration=file:///zeppelin/conf/log4j.properties
>> -Dlog4j.configurationFile=file:///zeppelin/conf/log4j2.properties
>> -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-shared_process--zeppelin-test-spark3-7d74d5df4-2g8x5.log"
>> *
>> --conf spark.driver.host=10.135.120.245
>> --conf "spark.dynamicAllocation.minExecutors=1"
>> --conf "spark.shuffle.service.enabled=true"
>> --conf "spark.sql.parquet.int96AsTimestamp=true"
>> --conf "spark.ui.retainedTasks=1"
>> --conf "spark.executor.heartbeatInterval=600s"
>> --conf "spark.ui.retainedJobs=100"
>> --conf "spark.sql.ui.retainedExecutions=10"
>> --conf "spark.hadoop.cloneConf=true"
>> --conf "spark.debug.maxToStringFields=20"
>> --conf "spark.executor.memory=70g"
>> --conf
>> "spark.executor.extraClassPath=../mysql-connector-java-8.0.18.jar:../guava-19.0.jar"
>>
>> --conf "spark.hadoop.fs.permissions.umask-mode=000"
>> --conf "spark.memory.storageFraction=0.1"
>> --conf "spark.scheduler.mode=FAIR"
>> --conf "spark.sql.adaptive.enabled=true"
>> --conf
>> "spark.master=mesos://zk://zk003:2181,zk004:2181,zk006:2181,/mesos-zeppelin"
>>
>> --conf "spark.driver.memory=15g"
>> --conf "spark.io.compression.codec=lz4"
>> --conf "spark.executor.uri=
>> https://artifactory.company.com/artifactory/static/spark/spark-dist/spark-3.1.2.2-hadoop-2.7-zulu";
>> -
>> -conf "spark.ui.retainedStages=500"
>> --conf "spark.mesos.uris=
>> https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/mysql-connector-java-8.0.18.jar,https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/guava-19.0.jar";
>>
>> --conf "spark.driver.maxResultSize=8g"
>> *--conf "spark.executor.extraJavaOptions=-DSERVICENAME=Zeppelin
>> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2015
>> -XX:-OmitStackTraceInFastThrow -Dcom.sun.management.jmxremote
>> -Dcom.sun.management.jmxremote.port=55745
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.ssl=false -verbose:gc
>> -Dlog4j.configurationFile=/etc/config/log4j2-executor-config.xml
>> -XX:+UseG1GC -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps -XX:+PrintFlagsFinal -XX:+PrintReferenceGC
>> -XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy
>> -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark
>> -XX:+PrintStringDeduplicationStatistics -XX:+UseStringDeduplication
>> -XX:InitiatingHeapOccupancyPercent=35
>> -Dhttps.proxyHost=proxy.service.consul -Dhttps.proxyPort=3128" *
>> --conf "spark.dynamicAllocation.enabled=true"
>> --conf "spark.default.parallelism=1200"
>> --conf "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2"
>> --conf
>> "spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS"
>>
>> --conf "spark.app.name=zeppelin_docker_spark3"
>> --conf "spark.shuffle.service.port=7337"
>> --conf "spark.memory.fraction=0.75"
>> --conf "spark.mesos.coarse=true"
>> --conf "spark.ui.port=4041"
>> --conf "spark.dynamicAllocation.executorIdleTimeout=60s"
>> --conf "spark.sql.shuffle.partitions=1200"
>> --conf "spark.sql.parquet.outputTimestampType=TIMESTAMP_MILLIS"
>> --conf "spark.dynamicAllocation.cachedExecutorIdleTimeout=120s"
>> --conf "spark.network.timeout=1200s"
>> --conf "spark.cores.max=600"
>> --conf
>> "spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"
>>
>> --conf "spark.worker.timeout=15"
>> *--conf
>> "spark.driver.extraJavaOptions=-Dhttps.proxyHost=proxy.service.consul
>> -Dhttps.proxyPort=3128
>> -Dlog4j.configuration=file:/usr/local/spark/conf/log4j.properties
>> -Djavax.jdo.option.Conn