Re: How to access line fileName in loading file using the textFile method

2018-09-26 Thread vermanurag
Spark has sc.wholeTextFiles() which returns RDD of tuple. First element of tuple if the file name and second element is the file content. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe

Re: How to access line fileName in loading file using the textFile method

2018-09-24 Thread Maxim Gekk
> So my question is supposing all files are in a directory and I read then using sc.textFile("path/*"), how can I understand each data is for which file? Maybe the input_file_name() function help you:

Re: How to access line fileName in loading file using the textFile method

2018-09-24 Thread Jörn Franke
You can create your own data source exactly doing this. Why is the file name important if the file content is the same? > On 24. Sep 2018, at 13:53, Soheil Pourbafrani wrote: > > Hi, My text data are in the form of text file. In the processing logic, I > need to know each word is from which

How to access line fileName in loading file using the textFile method

2018-09-24 Thread Soheil Pourbafrani
Hi, My text data are in the form of text file. In the processing logic, I need to know each word is from which file. Actually, I need to tokenize the words and create the pair of . The naive solution is to call sc.textFile for each file and having the fileName in a variable, create the pairs, but