Re: Issues getting Apache Spark

2022-05-26 Thread Apostolos N. Papadopoulos
How can we help if we do not know what is the problem? What is the error you are getting, at which step? Please give us more info to be able to help you. Spark installation on Linux/Windows is easy if you follow exactly the guidelines. Regards, Apostolos On 26/5/22 22:19, Martin, Michael

Issues getting Apache Spark

2022-05-26 Thread Martin, Michael
Hello, I'm writing to request assistance in getting Apache Spark on my laptop. I've followed instructions telling me to get Java, Python, Hadoop, Winutils, and Spark itself. I've followed instructions illustrating how to set my environment variables. For some reason, I still cannot get Spark

Re: Complexity with the data

2022-05-26 Thread Sid
Hi Gourav, Please find the below link for a detailed understanding. https://stackoverflow.com/questions/72389385/how-to-load-complex-data-using-pyspark/72391090#72391090 @Bjørn Jørgensen : I was able to read such kind of data using the below code:

Re: Complexity with the data

2022-05-26 Thread Gourav Sengupta
Hi, can you please give us a simple map of what the input is and what the output should be like? From your description it looks a bit difficult to figure out what exactly or how exactly you want the records actually parsed. Regards, Gourav Sengupta On Wed, May 25, 2022 at 9:08 PM Sid wrote: >

java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.FileUtils.mkdir --> Spark to Hive

2022-05-26 Thread Prasanth M Sasidharan
Hi Team, I am trying to persist data into a hive table through pyspark. Following is the line of code where its throwing error sparkSession =

Re: Complexity with the data

2022-05-26 Thread Bjørn Jørgensen
Yes, but how do you read it with spark. tor. 26. mai 2022, 18:30 skrev Sid : > I am not reading it through pandas. I am using Spark because when I tried > to use pandas which comes under import pyspark.pandas, it gives me an > error. > > On Thu, May 26, 2022 at 9:52 PM Bjørn Jørgensen > wrote:

Re: Complexity with the data

2022-05-26 Thread Sid
I am not reading it through pandas. I am using Spark because when I tried to use pandas which comes under import pyspark.pandas, it gives me an error. On Thu, May 26, 2022 at 9:52 PM Bjørn Jørgensen wrote: > ok, but how do you read it now? > > >

Fwd: java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.FileUtils.mkdir --> Spark to Hive

2022-05-26 Thread Prasanth M Sasidharan
Hi Team, I am trying to persist data into a hive table through pyspark. Following is the line of code where its throwing error sparkSession =

Re: Complexity with the data

2022-05-26 Thread Bjørn Jørgensen
ok, but how do you read it now? https://github.com/apache/spark/blob/8f610d1b4ce532705c528f3c085b0289b2b17a94/python/pyspark/pandas/namespace.py#L216 probably have to be updated with the default options. This is so that pandas API on spark will be like pandas. tor. 26. mai 2022 kl. 17:38 skrev

Re: Complexity with the data

2022-05-26 Thread Sid
I was passing the wrong escape characters due to which I was facing the issue. I have updated the user's answer on my post. Now I am able to load the dataset. Thank you everyone for your time and help! Much appreciated. I have more datasets like this. I hope that would be resolved using this

Re: Complexity with the data

2022-05-26 Thread Apostolos N. Papadopoulos
Since you cannot create the DF directly, you may try to first create an RDD of tuples from the file and then convert the RDD to a DF by using the toDF() transformation. Perhaps you may bypass the issue with this. Another thing that I have seen in the example is that you are using "" as an

Re: Complexity with the data

2022-05-26 Thread Sid
Thanks for opening the issue, Bjorn. However, could you help me to address the problem for now with some kind of alternative? I am actually stuck in this since yesterday. Thanks, Sid On Thu, 26 May 2022, 18:48 Bjørn Jørgensen, wrote: > Yes, it looks like a bug that we also have in pandas API

Re: Complexity with the data

2022-05-26 Thread Bjørn Jørgensen
Yes, it looks like a bug that we also have in pandas API on spark. So I have opened a JIRA for this. tor. 26. mai 2022 kl. 11:09 skrev Sid : > Hello Everyone, > > I have posted a question finally with the dataset and the column names. > > PFB

Re: Complexity with the data

2022-05-26 Thread Sid
Hello Everyone, I have posted a question finally with the dataset and the column names. PFB link: https://stackoverflow.com/questions/72389385/how-to-load-complex-data-using-pyspark Thanks, Sid On Thu, May 26, 2022 at 2:40 AM Bjørn Jørgensen wrote: > Sid, dump one of yours files. > >