Re: Help With unstructured text file with spark scala

2022-02-25 Thread Danilo Sousa
Rafael Mendes, Are you from ? Thanks. > On 21 Feb 2022, at 15:33, Danilo Sousa wrote: > > Yes, this a only single file. > > Thanks Rafael Mendes. > >> On 13 Feb 2022, at 07:13, Rafael Mendes > > wrote: >> >> Hi, Danilo. >> Do you have a single large file,

Re: Help With unstructured text file with spark scala

2022-02-21 Thread Danilo Sousa
Yes, this a only single file. Thanks Rafael Mendes. > On 13 Feb 2022, at 07:13, Rafael Mendes wrote: > > Hi, Danilo. > Do you have a single large file, only? > If so, I guess you can use tools like sed/awk to split it into more files > based on layout, so you can read these files into Spark.

Re: Help With unstructured text file with spark scala

2022-02-13 Thread Rafael Mendes
Hi, Danilo. Do you have a single large file, only? If so, I guess you can use tools like sed/awk to split it into more files based on layout, so you can read these files into Spark. Em qua, 9 de fev de 2022 09:30, Bitfox escreveu: > Hi > > I am not sure about the total situation. > But if you

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Bitfox
Hi I am not sure about the total situation. But if you want a scala integration I think it could use regex to match and capture the keywords. Here I wrote one you can modify by your end. import scala.io.Source import scala.collection.mutable.ArrayBuffer val list1 =

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Danilo Sousa
Hello, how are you? Thanks for your time > Does the data contain records? Yes > Are the records "homogenous" ; ie; do they have the same fields? Yes the data is homogenous but have “two layouts” in the same file. > What is the format of the data? All data is string file .txt > Are records

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Danilo Sousa
Hello Yes, for this block I can open as csv with # delimiter, but have the block that is no csv format. This is the likely key value. We have two different layouts in the same file. This is the “problem”. Thanks for your time. > Relação de Beneficiários Ativos e Excluídos > Carteira

Re: Help With unstructured text file with spark scala

2022-02-08 Thread Bitfox
Hello You can treat it as a csf file and load it from spark: >>> df = spark.read.format("csv").option("inferSchema", "true").option("header", "true").option("sep","#").load(csv_file) >>> df.show() ++---+-+ | Plano|Código

Re: Help With unstructured text file with spark scala

2022-02-08 Thread Lalwani, Jayesh
You will need to provide more info. Does the data contain records? Are the records "homogenous" ; ie; do they have the same fields? What is the format of the data? Are records separated by lines/seperators? Is the data sharded across multiple files? How big is each shard? On 2/8/22, 11:50

Help With unstructured text file with spark scala

2022-02-08 Thread Danilo Sousa
Hi I have to transform unstructured text to dataframe. Could anyone please help with Scala code ? Dataframe need as: operadora filial unidade contrato empresa plano codigo_beneficiario nome_beneficiario Relação de Beneficiários Ativos e Excluídos Carteira em#27/12/2019##Todos os Beneficiários