Unsubcribing

2023-04-24 Thread phiroc
Hello,
does this mailist list have an administrator, please?
I'm trying to unsubscribe, but to no avail.
Many thanks.


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark equivalent to hdfs groups

2022-09-07 Thread phiroc
Many thanks, Sean.

- Mail original -
De: "Sean Owen" 
À: phi...@free.fr
Cc: "User" 
Envoyé: Mercredi 7 Septembre 2022 17:05:55
Objet: Re: Spark equivalent to hdfs groups


No, because this is a storage concept, and Spark is not a storage system. You 
would appeal to tools and interfaces that the storage system provides, like 
hdfs. Where or how the hdfs binary is available depends on how you deploy Spark 
where; it would be available on a Hadoop cluster. It's just not a Spark 
question. 


On Wed, Sep 7, 2022 at 9:51 AM < phi...@free.fr > wrote: 


Hi Sean, 
I'm talking about HDFS Groups. 
On Linux, you can type "hdfs groups " to get the list of the groups 
user1 belongs to. 
In Zeppelin/Spark, the hdfs executable is not accessible. 
As a result, I wondered if there was a class in Spark (eg. Security or ACL) 
which would let you access a particular user's groups. 



- Mail original - 
De: "Sean Owen" < sro...@gmail.com > 
À: phi...@free.fr 
Cc: "User" < user@spark.apache.org > 
Envoyé: Mercredi 7 Septembre 2022 16:41:01 
Objet: Re: Spark equivalent to hdfs groups 


Spark isn't a storage system or user management system; no there is no notion 
of groups (groups for what?) 


On Wed, Sep 7, 2022 at 8:36 AM < phi...@free.fr > wrote: 


Hello, 
is there a Spark equivalent to "hdfs groups "? 
Many thanks. 
Philippe 

- 
To unsubscribe e-mail: user-unsubscr...@spark.apache.org 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark equivalent to hdfs groups

2022-09-07 Thread phiroc
Hi Sean,
I'm talking about HDFS Groups.
On Linux, you can type "hdfs groups " to get the list of the groups 
user1 belongs to.
In Zeppelin/Spark, the hdfs executable is not accessible.
As a result, I wondered if there was a class in Spark (eg. Security or ACL) 
which would let you access a particular user's groups.



- Mail original -
De: "Sean Owen" 
À: phi...@free.fr
Cc: "User" 
Envoyé: Mercredi 7 Septembre 2022 16:41:01
Objet: Re: Spark equivalent to hdfs groups


Spark isn't a storage system or user management system; no there is no notion 
of groups (groups for what?) 


On Wed, Sep 7, 2022 at 8:36 AM < phi...@free.fr > wrote: 


Hello, 
is there a Spark equivalent to "hdfs groups "? 
Many thanks. 
Philippe 

- 
To unsubscribe e-mail: user-unsubscr...@spark.apache.org 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark equivalent to hdfs groups

2022-09-07 Thread phiroc
Hello,
is there a Spark equivalent to "hdfs groups "?
Many thanks.
Philippe

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: processing files

2014-11-21 Thread phiroc
Hi Simon,

no, I don't need to run the tasks on multiple machines for now.

I will therefore stick to Makefile + shell or Java programs as Spark appears 
not to be the right tool for the tasks I am trying to accomplish.

Thanks you for your input.

Philippe



- Mail original -
De: Simon Hafner reactorm...@gmail.com
À: Philippe de Rochambeau phi...@free.fr
Envoyé: Vendredi 21 Novembre 2014 09:47:25
Objet: Re: processing files

2014-11-21 1:46 GMT-06:00 Philippe de Rochambeau phi...@free.fr:
 - reads xml files in thousands of directories, two levels down, from year x 
 to year y

You could try

sc.parallelize(new File(dirWithXML)).flatMap(sc.wholeTextFiles(_))

... not guaranteed to work.

 - extracts data from image tags in those files and stores them in a Sql or 
 NoSql database

From what I understand, spark expects no side effects from the
functions you pass to map(). So that's probably not that good of an
idea if you don't want duplicated records.

 - generates ImageMagick commands based on the extracted data to generate 
 images

data transformation, easy. collect() and save.

 - generates curl commands to index the image files with Solr

same as imagemagick.

 Does Spark provide any tools/features to facilitate and automate (batchify) 
 the above tasks?

Sure, but I wouldn't run the commands with spark. They might be run
twice or more.

 I can do all of the above with one or several Java programs, but I wondered 
 if using Spark would be of any use in such an endeavour.

Personally, I'd use a Makefile, xmlstarlet for the xml parsing, and
store the image paths to plaintext instead of a database, and get
parallelization via -j X. You could also run the imagemagick and curl
commands from there. But that naturally doesn't scale to multiple
machines.

Do you have more than one machine available to run this one? Do you
need to run it on more than one machine, because it takes too long on
just one? That's what spark excels at.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org