Re: Dealing with large number of small files

2022-04-26 Thread Sid
I have .txt files with JSON inside it. It is generated by some API calls by the Client. On Wed, Apr 27, 2022 at 12:39 AM Bjørn Jørgensen wrote: > What is that you have? Is it txt files or json files? > Or do you have txt files with JSON inside? > > > > tir. 26. apr. 2022 kl. 20:41 skrev Sid : >

Re: Dealing with large number of small files

2022-04-26 Thread Bjørn Jørgensen
What is that you have? Is it txt files or json files? Or do you have txt files with JSON inside? tir. 26. apr. 2022 kl. 20:41 skrev Sid : > Thanks for your time, everyone :) > > Much appreciated. > > I solved it using jq utility since I was dealing with JSON. I have solved > it using below

Re: Dealing with large number of small files

2022-04-26 Thread Sid
Thanks for your time, everyone :) Much appreciated. I solved it using jq utility since I was dealing with JSON. I have solved it using below script: find . -name '*.txt' -exec cat '{}' + | jq -s '.' > output.txt Thanks, Sid On Tue, Apr 26, 2022 at 9:37 PM Bjørn Jørgensen wrote: > and the

Re: Dealing with large number of small files

2022-04-26 Thread Bjørn Jørgensen
and the bash script seems to read txt files not json for f in Agent/*.txt; do cat ${f} >> merged.json;done; tir. 26. apr. 2022 kl. 18:03 skrev Gourav Sengupta < gourav.sengu...@gmail.com>: > Hi, > > what is the version of spark are you using? And where is the data stored. > > I am not quite

Re: Dealing with large number of small files

2022-04-26 Thread Gourav Sengupta
Hi, what is the version of spark are you using? And where is the data stored. I am not quite sure that just using a bash script will help because concatenating all the files into a single file creates a valid JSON. Regards, Gourav On Tue, Apr 26, 2022 at 3:44 PM Sid wrote: > Hello, > > Can

Re: Dealing with large number of small files

2022-04-26 Thread Artemis User
Most likely your JSON files are not formatted correctly.  Please see the Spark doc on specific formatting requirement for JSON data. https://spark.apache.org/docs/latest/sql-data-sources-json.html. On 4/26/22 10:43 AM, Sid wrote: Hello, Can somebody help me with the below problem?

Re: Dealing with large number of small files

2022-04-26 Thread Bjørn Jørgensen
df = spark.read.json("/*.json") use the *.json tir. 26. apr. 2022 kl. 16:44 skrev Sid : > Hello, > > Can somebody help me with the below problem? > > > https://stackoverflow.com/questions/72015557/dealing-with-large-number-of-small-json-files-using-pyspark > > > Thanks, > Sid > -- Bjørn

Dealing with large number of small files

2022-04-26 Thread Sid
Hello, Can somebody help me with the below problem? https://stackoverflow.com/questions/72015557/dealing-with-large-number-of-small-json-files-using-pyspark Thanks, Sid

Re: Vulnerabilities in htrace-core4-4.1.0-incubating.jar jar used in spark.

2022-04-26 Thread Bjørn Jørgensen
What version of spark is it that you have scanned? tir. 26. apr. 2022 kl. 12:48 skrev HARSH TAKKAR : > Hello, > > Please let me know if there is a fix available for following > vulnerabilities in htrace jar used in spark jars folder. > > LIBRARY: com.fasterxml.jackson.core:jackson-databind > >

Re: Vulnerabilities in htrace-core4-4.1.0-incubating.jar jar used in spark.

2022-04-26 Thread Bjørn Jørgensen
Spark version 3.3 will have this fixed. Spark github 35981 tir. 26. apr. 2022 kl. 12:48 skrev HARSH TAKKAR : > Hello, > > Please let me know if there is a fix available for following > vulnerabilities in htrace jar used in spark jars folder. > >

Vulnerabilities in htrace-core4-4.1.0-incubating.jar jar used in spark.

2022-04-26 Thread HARSH TAKKAR
Hello, Please let me know if there is a fix available for following vulnerabilities in htrace jar used in spark jars folder. LIBRARY: com.fasterxml.jackson.core:jackson-databind VULNERABILITY IDs : CVE-2020-9548 CVE-2020-9547 CVE-2020-8840 CVE-2020-36179 CVE-2020-35491