Re: Ubuntu 18.04: Docker: start-master.sh: command not found

2021-03-31 Thread JB Data31
find the *start-master.sh* file with command excuted as root  *find / -name
"start-master.sh" -print.*
Check that the directory found is well put in the $PATH.
As a first step go to the directory where the *start-master.sh* is with *cd*
command and execute with command *./start-master.sh*.

@*JB*Δ 



Le mer. 31 mars 2021 à 12:35, GUINKO Ferdinand 
a écrit :

>
> I have exited from the container, and logged in using:
>
> sudo docker run -it -p8080:8080 ubuntu
>
> Then I tried to launch Start Standalone Spark Master Server doing:
>
> start-master.sh
>
> and got the following message:
>
> bash: start-master.sh: command not found
>
> So I started the process of setting the environmental variables again doing:
>
> echo "export SPARK_HOME=/opt/spark" >> ~/.profile
> export 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin:/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin
> echo $PATH
> echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile
>
> Here is the output of the file .profile:
>
> root@291b0eb654ea:/# cat ~/.profile
> # ~/.profile: executed by Bourne-compatible login shells.
>
> if [ "$BASH" ]; then
>   if [ -f ~/.bashrc ]; then
> . ~/.bashrc
>   fi
> fi
>
> mesg n 2> /dev/null || true
> export SPARK_HOME=/opt/spark
> export 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin:/sbin:/opt/spark/bin:/opt/spark/sbin
> export PYSPARK_PYTHON=/usr/bin/python3
>
> I also typed this command:
>
> root@291b0eb654ea:/# source ~/.profile
>
> I am still getting the following message:
>
> bash: start-master.sh: command not found
>
> What am missing please?
>
> --
> «Ce dont le monde a le plus besoin, c'est d'hommes, non pas des hommes
> qu'on achète et qui se vendent, mais d'hommes profondément loyaux et
> intègres, des hommes qui ne craignent pas d'appeler le péché par son nom,
> des hommes dont la conscience soit aussi fidèle à son devoir que la
> boussole l'est au pôle, des hommes qui défendraient la justice et la vérité
> même si l'univers s'écroulait.» Ellen Gould WHITE, education, P. 55
>
> Le Mardi, Mars 30, 2021 21:01 GMT, Mich Talebzadeh <
> mich.talebza...@gmail.com> a écrit:
>
>
> Those two export lines means set SPARK_HOME and PATH *as environment
> variables* in the session you have in Ubuntu.
>
> Check this website for more info
>
> bash - How do I add environment variables? - Ask Ubuntu
> 
>
>
> If you are familiar with windows, then they are equivalent to Windows
> environment variables. For example, note SPARK_HOME
>
>
> [image: image.png]
>
>
> Next. Now you are trying to start spark in standalone mode. Check the
> following link
>
> Spark Standalone Mode - Spark 3.1.1 Documentation (apache.org)
> 
>
> to learn more about it
>
> And also check the logfile generated by invoking spark-master.sh in your
> output
>
> starting org.apache.spark.deploy.master.Master, logging to*
> /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-23d865d7f117.out*
>
> HTH
>
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On Tue, 30 Mar 2021 at 20:48, GUINKO Ferdinand <
> tonguimferdin...@guinko.net> wrote:
>
>> This is what I am having now:
>>
>> root@33z261w1a18:/opt/spark# *export SPARK_HOME=/opt/spark*
>> root@33z261w1a18:/opt/spark# *export
>> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin:/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin*
>> root@33z261w1a18:/opt/spark# *echo $PATH*
>>
>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin:/sbin:/opt/spark/bin:/opt/spark/sbin
>> root@33z261w1a18:/opt/spark# *start-master.sh*
>> starting org.apache.spark.deploy.master.Master, logging to
>> /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-23d865d7f117.out
>> root@33z261w1a18:/opt/spark#
>>
>> It seems that SPARK was trying to start but couldn't.
>>
>> Please would you explain me what the following lines means and do:
>>
>> export SPARK_HOME=/opt/spark
>> export
>> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin:/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin
>>
>> then
>>
>> echo $PATH
>>
>>
>> Thank you for the assistance.
>>
>> --
>> «Ce dont le monde a le plus besoin, c'est d'hommes, non pas des hommes
>> qu'on achète et qui se vendent, mais d'hommes profondément loyaux et
>> intègres, des hommes qui ne craignent pas d'a

Re: How can I add extra mounted disk to HDFS

2020-04-28 Thread JB Data31
Use Hadoop NFSv3 gateway to mount FS.

@*JB*Δ 



Le mar. 28 avr. 2020 à 23:18, Chetan Khatri  a
écrit :

> Hi Spark Users,
>
> My spark job gave me an error No Space left on the device
>


Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread JB Data31
There is a kind of check in the *yarn-site.xml*


*yarn.nodemanager.remote-app-log-dir
/var/yarn/logs*
**

Using *hdfs://:9000* as* fs.defaultFS* in *core-site.xml* you have to *hdfs
dfs -mkdir /var/yarn/logs*
Using *S3://* as * fs.defaultFS*...

Take care of *.dir* properties in* hdfs-site.xml*. Must point to local or
S3 value.

Curious to see *YARN* working without *DFS*.

@*JB*Δ 

Le lun. 20 mai 2019 à 09:54, Hariharan  a écrit :

> Hi Huizhe,
>
> You can set the "fs.defaultFS" field in core-site.xml to some path on s3.
> That way your spark job will use S3 for all operations that need HDFS.
> Intermediate data will still be stored on local disk though.
>
> Thanks,
> Hari
>
> On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari 
> wrote:
>
>> While spark can read from S3 directly in EMR, I believe it still needs
>> the HDFS to perform shuffles and to write intermediate data into disk when
>> doing jobs (I.e. when the in memory need stop spill over to disk)
>>
>> For these operations, Spark does need a distributed file system - You
>> could use something like EMRFS (which is like a HDFS backed by S3) on
>> Amazon.
>>
>> The issue could be something else too - so a stacktrace or error message
>> could help in understanding the problem.
>>
>>
>>
>> On Mon, May 20, 2019, 07:20 Huizhe Wang  wrote:
>>
>>> Hi,
>>>
>>> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and
>>> using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and
>>> DataNode. I got an error when using yarn cluster mode. Could I using yarn
>>> without start DFS, how could I use this mode?
>>>
>>> Yours,
>>> Jane
>>>
>>


Re: Masking username in Spark with regexp_replace and reverse functions

2019-03-17 Thread JB Data31
Hi,

Why don't add a random regexp in regexp substitution, i.e.
https://onlinerandomtools.com/generate-random-data-from-regexp

@*JB*Δ 



Le sam. 16 mars 2019 à 18:39, Mich Talebzadeh  a
écrit :

> Hi,
>
> I am looking at Description column of a bank statement (CSV download) that
> has the following format
>
> scala> account_table.printSchema
> root
>  |-- TransactionDate: date (nullable = true)
>  |-- TransactionType: string (nullable = true)
>  |-- Description: string (nullable = true)
>  |-- Value: double (nullable = true)
>  |-- Balance: double (nullable = true)
>  |-- AccountName: string (nullable = true)
>  |-- AccountNumber: string (nullable = true)
>
> The column description for BACS payments contains the name of the
> individual who paid into the third party account. I need to mask the name
> but cannot simply use a literal as below for all contents of descriptions
> column!
>
> f1.withColumn("Description", lit("*** Masked
> ***")).select('Description.as("Who paid")
>
> So I try the following combination
>
> f1.select(trim(substring(substring_index('Description, ",",
> 1),2,50)).as("name in clear"),
> reverse(regexp_replace(regexp_replace(regexp_replace(substring(regexp_replace('Description,
> "^['A-Z]", "XX"),2,6),"[A-F]","X")," ","X"),"[,]","R")).as("Masked")).show
> +--+--+
> |  in clear|Masked|
> +--+--+
> |   FATAH SABAH|HXTXXX|
> |   C HIGGINSON|GIHXXX|
> |   SOLTA A|XTLOSX|
> +--+--+
>
> This seems to work as it not only masks the name but also makes it
> consistent for all names (in other words, the same username gets the same
> mask).
>
> Are there any better alternatives?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>