Re: Write DataFrame with Partition and choose Filename in PySpark

2023-05-05 Thread Marco Costantini
; from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Thu, 4 May 2023 at 22:14, Marco Costantini < > marco.costant...@rocketfn

Re: Write DataFrame with Partition and choose Filename in PySpark

2023-05-04 Thread Marco Costantini
r:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from

Write DataFrame with Partition and choose Filename in PySpark

2023-05-04 Thread Marco Costantini
Hello, I am testing writing my DataFrame to S3 using the DataFrame `write` method. It mostly does a great job. However, it fails one of my requirements. Here are my requirements. - Write to S3 - use `partitionBy` to automatically make folders based on my chosen partition columns - control the

Re: Write custom JSON from DataFrame in PySpark

2023-05-04 Thread Marco Costantini
{a3}| > +---+-+ > > df2.write.json("data.json") > {"id":1,"stuff":{"datA":"a1"}} > {"id":2,"stuff":{"datA":"a2"}} > {"id":3,"stuff":{"datA":"a3"}

Write custom JSON from DataFrame in PySpark

2023-05-03 Thread Marco Costantini
Hello, Let's say I have a very simple DataFrame, as below. +---++ | id|datA| +---++ | 1| a1| | 2| a2| | 3| a3| +---++ Let's say I have a requirement to write this to a bizarre JSON structure. For example: { "id": 1, "stuff": { "datA": "a1" } } How can I achieve

Re: What is the best way to organize a join within a foreach?

2023-04-26 Thread Marco Costantini
zadeh >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>&g

Re: What is the best way to organize a join within a foreach?

2023-04-25 Thread Marco Costantini
order|108.11| 108.11| > |Mich| 50009| Mich's 9th order|109.11| 109.11| > |Mich| 50010|Mich's 10th order|210.11| 210.11| > +++-+--+-+ > > You can start on this. Happy coding > > Mich Talebzadeh, > Lead Solutions Archi

Re: What is the best way to organize a join within a foreach?

2023-04-25 Thread Marco Costantini
is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 25 Apr 2023 at 14:07, Marco Costantini < > marco.costant...@rocketfncl.com> wrote: > >> Thanks Mich, >

Re: What is the best way to organize a join within a foreach?

2023-04-25 Thread Marco Costantini
ny monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 25 Apr 2023 at 00:15, Marco Costantini < > marco.costant...@rocketfncl.com> wrote: > >> I have two tables: {users, orders}. In this example, let's say that for >> each 1 User

What is the best way to organize a join within a foreach?

2023-04-24 Thread Marco Costantini
I have two tables: {users, orders}. In this example, let's say that for each 1 User in the users table, there are 10 Orders in the orders table. I have to use pyspark to generate a statement of Orders for each User. So, a single user will need his/her own list of Orders. Additionally, I need

What is the best way to organize a join within a foreach?

2023-04-24 Thread Marco Costantini
Marco Costantini 5:55 PM (5 minutes ago) to user I have two tables: {users, orders}. In this example, let's say that for each 1 User in the users table, there are 10 Orders in the orders table. I have to use pyspark to generate a statement of Orders for each User. So, a single user will need

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Marco Costantini
to it. On Wed, Apr 9, 2014 at 11:08 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Marco, If you call spark-ec2 launch without specifying an AMI, it will default to the Spark-provided AMI. Nick On Wed, Apr 9, 2014 at 9:43 AM, Marco Costantini silvio.costant...@granatads.com wrote

Re: AWS Spark-ec2 script with different user

2014-04-08 Thread Marco Costantini
to the community to know that the root user work-around does/doesn't work any more for paravirtual instances. Thanks, Marco. On Tue, Apr 8, 2014 at 9:51 AM, Marco Costantini silvio.costant...@granatads.com wrote: As requested, here is the script I am running. It is a simple shell script which

Re: AWS Spark-ec2 script with different user

2014-04-08 Thread Marco Costantini
I was able to keep the workaround ...around... by overwriting the generated '/root/.ssh/authorized_keys' file with a known good one, in the '/etc/rc.local' file On Tue, Apr 8, 2014 at 10:12 AM, Marco Costantini silvio.costant...@granatads.com wrote: Another thing I didn't mention. The AMI

AWS Spark-ec2 script with different user

2014-04-07 Thread Marco Costantini
Hi all, On the old Amazon Linux EC2 images, the user 'root' was enabled for ssh. Also, it is the default user for the Spark-EC2 script. Currently, the Amazon Linux images have an 'ec2-user' set up for ssh instead of 'root'. I can see that the Spark-EC2 script allows you to specify which user to

Re: AWS Spark-ec2 script with different user

2014-04-07 Thread Marco Costantini
we build should have root ssh access -- Do you find this not to be the case ? You can also enable root ssh access in a vanilla AMI by editing /etc/ssh/sshd_config and setting PermitRootLogin to yes Thanks Shivaram On Mon, Apr 7, 2014 at 11:14 AM, Marco Costantini silvio.costant