[jira] [Created] (SPARK-21020) How to implement custom input source for creating streaming DataFrames?

2017-06-08 Thread Vijay (JIRA)
Vijay created SPARK-21020:
-

 Summary: How to implement custom input source for creating 
streaming DataFrames?
 Key: SPARK-21020
 URL: https://issues.apache.org/jira/browse/SPARK-21020
 Project: Spark
  Issue Type: Brainstorming
  Components: Structured Streaming
Affects Versions: 2.1.1
Reporter: Vijay
Priority: Minor


Can someone please explain how to implement a custom input source for creating 
the streaming dataframes.

Just like custom receiver in DStreams.

Any references/suggestions are appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21182) Structured streaming on Spark-shell on windows

2017-06-22 Thread Vijay (JIRA)
Vijay created SPARK-21182:
-

 Summary: Structured streaming on Spark-shell on windows
 Key: SPARK-21182
 URL: https://issues.apache.org/jira/browse/SPARK-21182
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.1.1
 Environment: Windows 10
spark-2.1.1-bin-hadoop2.7
Reporter: Vijay
Priority: Minor


Structured streaming output operation is failing on Windows shell.

As per the error message, path is being prefixed with File separator as in 
Linux.
Thus, causing the IllegalArgumentException.

Following is the error message.

scala> val query = wordCounts.writeStream  .outputMode("complete")  
.format("console")  .start()
java.lang.IllegalArgumentException: Pathname 
{color:red}*/*{color}C:/Users/Vijay/AppData/Local/Temp/temporary-081b482c-98a4-494e-8cfb-22d966c2da01/offsets
 from 
C:/Users/Vijay/AppData/Local/Temp/temporary-081b482c-98a4-494e-8cfb-22d966c2da01/offsets
 is not a valid DFS filename.
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
  at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
  at 
org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:222)
  at 
org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:280)
  at 
org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:268)
  ... 52 elided



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21182) Structured streaming on Spark-shell on windows

2017-06-27 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065921#comment-16065921
 ] 

Vijay commented on SPARK-21182:
---

I'm still facing the same issue.
Actually I have configured Hadoop on windows along with Spark.

will this be an issue?

> Structured streaming on Spark-shell on windows
> --
>
> Key: SPARK-21182
> URL: https://issues.apache.org/jira/browse/SPARK-21182
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1
> Environment: Windows 10
> spark-2.1.1-bin-hadoop2.7
>Reporter: Vijay
>Priority: Minor
>
> Structured streaming output operation is failing on Windows shell.
> As per the error message, path is being prefixed with File separator as in 
> Linux.
> Thus, causing the IllegalArgumentException.
> Following is the error message.
> scala> val query = wordCounts.writeStream  .outputMode("complete")  
> .format("console")  .start()
> java.lang.IllegalArgumentException: Pathname 
> {color:red}*/*{color}C:/Users/Vijay/AppData/Local/Temp/temporary-081b482c-98a4-494e-8cfb-22d966c2da01/offsets
>  from 
> C:/Users/Vijay/AppData/Local/Temp/temporary-081b482c-98a4-494e-8cfb-22d966c2da01/offsets
>  is not a valid DFS filename.
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
>   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:222)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:280)
>   at 
> org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:268)
>   ... 52 elided



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4402) Output path validation of an action statement resulting in runtime exception

2014-11-14 Thread Vijay (JIRA)
Vijay created SPARK-4402:


 Summary: Output path validation of an action statement resulting 
in runtime exception
 Key: SPARK-4402
 URL: https://issues.apache.org/jira/browse/SPARK-4402
 Project: Spark
  Issue Type: Wish
Reporter: Vijay
Priority: Minor


Output path validation is happening at the time of statement execution as a 
part of lazyevolution of action statement. But if the path already exists then 
it throws a runtime exception. Hence all the processing completed till that 
point is lost which results in resource wastage (processing time and CPU usage).

If this I/O related validation is done before the RDD action operations then 
this runtime exception can be avoided.
I believe similar validation/ feature is implemented in hadoop also.

Example:

SchemaRDD.saveAsTextFile() evaluated the path during runtime 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4402) Output path validation of an action statement resulting in runtime exception

2014-11-15 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213729#comment-14213729
 ] 

Vijay commented on SPARK-4402:
--

Thanks for the reply [~srowen]

This is different scenario from the issue SPARK-1100.

Issue SPARK-1100 says that output directory is over written if it exists.
I think that fix works fine.

But, my concern is that spark throws a runtime exception if the output 
directory exists. This is happening after executing all the previous action 
statements and resulting in abrupt termination of the program. Result of the 
previous action statements is lost.

Please confirm whether this abrupt program termination is expected?

> Output path validation of an action statement resulting in runtime exception
> 
>
> Key: SPARK-4402
> URL: https://issues.apache.org/jira/browse/SPARK-4402
> Project: Spark
>  Issue Type: Wish
>Reporter: Vijay
>Priority: Minor
>
> Output path validation is happening at the time of statement execution as a 
> part of lazyevolution of action statement. But if the path already exists 
> then it throws a runtime exception. Hence all the processing completed till 
> that point is lost which results in resource wastage (processing time and CPU 
> usage).
> If this I/O related validation is done before the RDD action operations then 
> this runtime exception can be avoided.
> I believe similar validation/ feature is implemented in hadoop also.
> Example:
> SchemaRDD.saveAsTextFile() evaluated the path during runtime 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4402) Output path validation of an action statement resulting in runtime exception

2014-11-16 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214320#comment-14214320
 ] 

Vijay commented on SPARK-4402:
--

Yes, output path is being validated in PairRDDFunctions.saveAsHadoopDataset. 
Please find the below exception details.
So, the output path is validated only during the execution  
saveAsHadoopDataset. After completing all the preceding statements. 

My query is that is it possible to make this validation in the first place when 
the program executon starts.

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: 
Output directory file:/home/HadoopUser/eclipse-scala/test/output1 already exists
at 
org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
at 
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:968)
at 
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:878)
at 
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:792)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1159)
at test.OutputTest$.main(OutputTest.scala:19)
at test.OutputTest.main(OutputTest.scala)

> Output path validation of an action statement resulting in runtime exception
> 
>
> Key: SPARK-4402
> URL: https://issues.apache.org/jira/browse/SPARK-4402
> Project: Spark
>  Issue Type: Wish
>Reporter: Vijay
>Priority: Minor
>
> Output path validation is happening at the time of statement execution as a 
> part of lazyevolution of action statement. But if the path already exists 
> then it throws a runtime exception. Hence all the processing completed till 
> that point is lost which results in resource wastage (processing time and CPU 
> usage).
> If this I/O related validation is done before the RDD action operations then 
> this runtime exception can be avoided.
> I believe similar validation/ feature is implemented in hadoop also.
> Example:
> SchemaRDD.saveAsTextFile() evaluated the path during runtime 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4402) Output path validation of an action statement resulting in runtime exception

2014-11-17 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214635#comment-14214635
 ] 

Vijay commented on SPARK-4402:
--

Thanks for the explanation.
It is clear now.

> Output path validation of an action statement resulting in runtime exception
> 
>
> Key: SPARK-4402
> URL: https://issues.apache.org/jira/browse/SPARK-4402
> Project: Spark
>  Issue Type: Wish
>Reporter: Vijay
>Priority: Minor
>
> Output path validation is happening at the time of statement execution as a 
> part of lazyevolution of action statement. But if the path already exists 
> then it throws a runtime exception. Hence all the processing completed till 
> that point is lost which results in resource wastage (processing time and CPU 
> usage).
> If this I/O related validation is done before the RDD action operations then 
> this runtime exception can be avoided.
> I believe similar validation/ feature is implemented in hadoop also.
> Example:
> SchemaRDD.saveAsTextFile() evaluated the path during runtime 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4402) Output path validation of an action statement resulting in runtime exception

2014-11-17 Thread Vijay (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay resolved SPARK-4402.
--
Resolution: Not a Problem

> Output path validation of an action statement resulting in runtime exception
> 
>
> Key: SPARK-4402
> URL: https://issues.apache.org/jira/browse/SPARK-4402
> Project: Spark
>  Issue Type: Wish
>Reporter: Vijay
>Priority: Minor
>
> Output path validation is happening at the time of statement execution as a 
> part of lazyevolution of action statement. But if the path already exists 
> then it throws a runtime exception. Hence all the processing completed till 
> that point is lost which results in resource wastage (processing time and CPU 
> usage).
> If this I/O related validation is done before the RDD action operations then 
> this runtime exception can be avoided.
> I believe similar validation/ feature is implemented in hadoop also.
> Example:
> SchemaRDD.saveAsTextFile() evaluated the path during runtime 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6435) spark-shell --jars option does not add all jars to classpath

2015-03-20 Thread vijay (JIRA)
vijay created SPARK-6435:


 Summary: spark-shell --jars option does not add all jars to 
classpath
 Key: SPARK-6435
 URL: https://issues.apache.org/jira/browse/SPARK-6435
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.3.0
 Environment: Win64
Reporter: vijay


Not all jars supplied via the --jars option will be added to the driver (and 
presumably executor) classpath.  The first jar(s) will be added, but not all.

To reproduce this, just add a few jars (I tested 5) to the --jars option, and 
then try to import a class from the last jar.  This fails.  A simple 
reproducer: 

Create a bunch of dummy jars:
jar cfM jar1.jar log.txt
jar cfM jar2.jar log.txt
jar cfM jar3.jar log.txt
jar cfM jar4.jar log.txt

Start the spark-shell with the dummy jars and guava at the end:
%SPARK_HOME%\bin\spark-shell --master local --jars jar1.jar,jar2.jar,jar
3.jar,jar4.jar,c:\code\lib\guava-14.0.1.jar

In the shell, try importing from guava; you'll get an error:
{code}
scala> import com.google.common.base.Strings
:19: error: object Strings is not a member of package 
com.google.common.base
   import com.google.common.base.Strings
  ^
{code}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6435) spark-shell --jars option does not add all jars to classpath

2015-03-20 Thread vijay (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vijay updated SPARK-6435:
-
Description: 
Not all jars supplied via the --jars option will be added to the driver (and 
presumably executor) classpath.  The first jar(s) will be added, but not all.

To reproduce this, just add a few jars (I tested 5) to the --jars option, and 
then try to import a class from the last jar.  This fails.  A simple 
reproducer: 

Create a bunch of dummy jars:
jar cfM jar1.jar log.txt
jar cfM jar2.jar log.txt
jar cfM jar3.jar log.txt
jar cfM jar4.jar log.txt

Start the spark-shell with the dummy jars and guava at the end:
%SPARK_HOME%\bin\spark-shell --master local --jars 
jar1.jar,jar2.jar,jar3.jar,jar4.jar,c:\code\lib\guava-14.0.1.jar

In the shell, try importing from guava; you'll get an error:
{code}
scala> import com.google.common.base.Strings
:19: error: object Strings is not a member of package 
com.google.common.base
   import com.google.common.base.Strings
  ^
{code}




  was:
Not all jars supplied via the --jars option will be added to the driver (and 
presumably executor) classpath.  The first jar(s) will be added, but not all.

To reproduce this, just add a few jars (I tested 5) to the --jars option, and 
then try to import a class from the last jar.  This fails.  A simple 
reproducer: 

Create a bunch of dummy jars:
jar cfM jar1.jar log.txt
jar cfM jar2.jar log.txt
jar cfM jar3.jar log.txt
jar cfM jar4.jar log.txt

Start the spark-shell with the dummy jars and guava at the end:
%SPARK_HOME%\bin\spark-shell --master local --jars jar1.jar,jar2.jar,jar
3.jar,jar4.jar,c:\code\lib\guava-14.0.1.jar

In the shell, try importing from guava; you'll get an error:
{code}
scala> import com.google.common.base.Strings
:19: error: object Strings is not a member of package 
com.google.common.base
   import com.google.common.base.Strings
  ^
{code}





> spark-shell --jars option does not add all jars to classpath
> 
>
> Key: SPARK-6435
> URL: https://issues.apache.org/jira/browse/SPARK-6435
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.3.0
> Environment: Win64
>Reporter: vijay
>
> Not all jars supplied via the --jars option will be added to the driver (and 
> presumably executor) classpath.  The first jar(s) will be added, but not all.
> To reproduce this, just add a few jars (I tested 5) to the --jars option, and 
> then try to import a class from the last jar.  This fails.  A simple 
> reproducer: 
> Create a bunch of dummy jars:
> jar cfM jar1.jar log.txt
> jar cfM jar2.jar log.txt
> jar cfM jar3.jar log.txt
> jar cfM jar4.jar log.txt
> Start the spark-shell with the dummy jars and guava at the end:
> %SPARK_HOME%\bin\spark-shell --master local --jars 
> jar1.jar,jar2.jar,jar3.jar,jar4.jar,c:\code\lib\guava-14.0.1.jar
> In the shell, try importing from guava; you'll get an error:
> {code}
> scala> import com.google.common.base.Strings
> :19: error: object Strings is not a member of package 
> com.google.common.base
>import com.google.common.base.Strings
>   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6435) spark-shell --jars option does not add all jars to classpath

2015-03-20 Thread vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371420#comment-14371420
 ] 

vijay commented on SPARK-6435:
--

It works when guava is the 1st or 2nd jar.  Not sure at what point spark starts 
dropping jars, but I had this issue with multiple 'real' jars (i.e. containing 
.class files) in the --jars option: If I move a jar to the front of the list, 
it works; move it to the back, it fails.

> spark-shell --jars option does not add all jars to classpath
> 
>
> Key: SPARK-6435
> URL: https://issues.apache.org/jira/browse/SPARK-6435
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.3.0
> Environment: Win64
>Reporter: vijay
>
> Not all jars supplied via the --jars option will be added to the driver (and 
> presumably executor) classpath.  The first jar(s) will be added, but not all.
> To reproduce this, just add a few jars (I tested 5) to the --jars option, and 
> then try to import a class from the last jar.  This fails.  A simple 
> reproducer: 
> Create a bunch of dummy jars:
> jar cfM jar1.jar log.txt
> jar cfM jar2.jar log.txt
> jar cfM jar3.jar log.txt
> jar cfM jar4.jar log.txt
> Start the spark-shell with the dummy jars and guava at the end:
> %SPARK_HOME%\bin\spark-shell --master local --jars 
> jar1.jar,jar2.jar,jar3.jar,jar4.jar,c:\code\lib\guava-14.0.1.jar
> In the shell, try importing from guava; you'll get an error:
> {code}
> scala> import com.google.common.base.Strings
> :19: error: object Strings is not a member of package 
> com.google.common.base
>import com.google.common.base.Strings
>   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6435) spark-shell --jars option does not add all jars to classpath

2015-03-23 Thread vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375707#comment-14375707
 ] 

vijay commented on SPARK-6435:
--

I tested this on Linux with the 1.3.0 release, works fine.  Apparently a 
windows-specific issue.  Apparently on windows only the 1st jar is picked up.  
This appears to be a problem with parsing the command line, introduced by the 
change in windows scripts between 1.2.0 and 1.3.0.  A simple fix to 
bin\windows-utils.cmd resolves the issue.

I ran this command to test with 'real' jars:
{code}
%SPARK_HOME%\bin\spark-shell --master local --jars 
c:\code\elasticsearch-1.4.2\lib\lucene-core-4.10.2.jar,c:\temp\guava-14.0.1.jar
{code}

Here are some snippets from the console - note that only the 1st jar is added; 
I can load classes from the 1st jar but not the 2nd:
{code}
15/03/23 10:57:41 INFO SparkUI: Started SparkUI at http://vgarla-t440P.fritz.box
:4040
15/03/23 10:57:41 INFO SparkContext: Added JAR 
file:/c:/code/elasticsearch-1.4.2/lib/lucene-core-4.10.2.jar at 
http://192.168.178.41:54601/jars/lucene-core-4.10.2.jar with timestamp 
1427104661969
15/03/23 10:57:42 INFO Executor: Starting executor ID  on host localhost
...
scala> import org.apache.lucene.util.IOUtils
import org.apache.lucene.util.IOUtils

scala> import com.google.common.base.Strings
:20: error: object Strings is not a member of package 
com.google.common.base
{code}

Looking at the command line in jvisualvm, I see that only the 1st jar is aded:
{code}
Main class: org.apache.spark.deploy.SparkSubmit
Arguments: --class org.apache.spark.repl.Main --master local --jars 
c:\code\elasticsearch-1.4.2\lib\lucene-core-4.10.2.jar spark-shell 
c:\temp\guava-14.0.1.jar
{code}
In spark 1.2.0, spark-shell2.cmd just passed arguments "as is" to the java 
command line:
{code}
cmd /V /E /C %SPARK_HOME%\bin\spark-submit.cmd --class 
org.apache.spark.repl.Main %* spark-shell
{code}

In spark 1.3.0, spark-shell2.cmd calls windows-utils.cmd to parse arguments 
into SUBMISSION_OPTS and APPLICATION_OPTS.  Only the first jar in the list 
passed to --jars makes it into the SUBMISSION_OPTS; latter jars are added to 
APPLICATION_OPTS:
{code}
call %SPARK_HOME%\bin\windows-utils.cmd %*
if %ERRORLEVEL% equ 1 (
  call :usage
  exit /b 1
)
echo SUBMISSION_OPTS=%SUBMISSION_OPTS%
echo APPLICATION_OPTS=%APPLICATION_OPTS%

cmd /V /E /C %SPARK_HOME%\bin\spark-submit.cmd --class 
org.apache.spark.repl.Main %SUBMISSION_OPTS% spark-shell %APPLICATION_OPTS%
{code}

The problem is that by the time the command line arguments get to 
windows-utils.cmd, the windows command line processor has split the 
comma-separated list into distinct arguments.  The windows way of saying "treat 
this as a single arg" is to surround in double-quotes.  However, when I 
surround the jars in quotes, I get an error:
{code}
%SPARK_HOME%\bin\spark-shell --master local --jars 
"c:\code\elasticsearch-1.4.2\lib\lucene-core-4.10.2.jar,c:\temp\guava-14.0.1.jar"
c:\temp\guava-14.0.1.jar""=="x" was unexpected at this time.
{code}
Digging in, I see this is caused by this line from windows-utils.cmd:
{code}
  if "x%2"=="x" (
{code}

Replacing the quotes with square brackets does the trick:
{code}
  if [x%2]==[x] (
{code}

Now the command line is processed correctly.



> spark-shell --jars option does not add all jars to classpath
> 
>
> Key: SPARK-6435
> URL: https://issues.apache.org/jira/browse/SPARK-6435
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.3.0
> Environment: Win64
>Reporter: vijay
>
> Not all jars supplied via the --jars option will be added to the driver (and 
> presumably executor) classpath.  The first jar(s) will be added, but not all.
> To reproduce this, just add a few jars (I tested 5) to the --jars option, and 
> then try to import a class from the last jar.  This fails.  A simple 
> reproducer: 
> Create a bunch of dummy jars:
> jar cfM jar1.jar log.txt
> jar cfM jar2.jar log.txt
> jar cfM jar3.jar log.txt
> jar cfM jar4.jar log.txt
> Start the spark-shell with the dummy jars and guava at the end:
> %SPARK_HOME%\bin\spark-shell --master local --jars 
> jar1.jar,jar2.jar,jar3.jar,jar4.jar,c:\code\lib\guava-14.0.1.jar
> In the shell, try importing from guava; you'll get an error:
> {code}
> scala> import com.google.common.base.Strings
> :19: error: object Strings is not a member of package 
> com.google.common.base
>import com.google.common.base.Strings
>   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6435) spark-shell --jars option does not add all jars to classpath

2015-03-23 Thread vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376501#comment-14376501
 ] 

vijay commented on SPARK-6435:
--

I came up with square brackets after 2 minutes of googling/stackoverflowing; a 
more thorough search/understanding of bat scripts might result in a 
better/different solution (I can rule myself out of the more thorough bat 
script understanding).  That being said, this test is used to check for an 
empty string.  Square brackets is the most upvoted solution: 
http://stackoverflow.com/questions/2541767/what-is-the-proper-way-to-test-if-variable-is-empty-in-a-batch-file-if-not-1


> spark-shell --jars option does not add all jars to classpath
> 
>
> Key: SPARK-6435
> URL: https://issues.apache.org/jira/browse/SPARK-6435
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Windows
>Affects Versions: 1.3.0
> Environment: Win64
>Reporter: vijay
>
> Not all jars supplied via the --jars option will be added to the driver (and 
> presumably executor) classpath.  The first jar(s) will be added, but not all.
> To reproduce this, just add a few jars (I tested 5) to the --jars option, and 
> then try to import a class from the last jar.  This fails.  A simple 
> reproducer: 
> Create a bunch of dummy jars:
> jar cfM jar1.jar log.txt
> jar cfM jar2.jar log.txt
> jar cfM jar3.jar log.txt
> jar cfM jar4.jar log.txt
> Start the spark-shell with the dummy jars and guava at the end:
> %SPARK_HOME%\bin\spark-shell --master local --jars 
> jar1.jar,jar2.jar,jar3.jar,jar4.jar,c:\code\lib\guava-14.0.1.jar
> In the shell, try importing from guava; you'll get an error:
> {code}
> scala> import com.google.common.base.Strings
> :19: error: object Strings is not a member of package 
> com.google.common.base
>import com.google.common.base.Strings
>   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6435) spark-shell --jars option does not add all jars to classpath

2015-03-27 Thread vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383566#comment-14383566
 ] 

vijay commented on SPARK-6435:
--

Strange - when I test it with multiple jars (with the fixed script) everything 
works

> spark-shell --jars option does not add all jars to classpath
> 
>
> Key: SPARK-6435
> URL: https://issues.apache.org/jira/browse/SPARK-6435
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Windows
>Affects Versions: 1.3.0
> Environment: Win64
>Reporter: vijay
>
> Not all jars supplied via the --jars option will be added to the driver (and 
> presumably executor) classpath.  The first jar(s) will be added, but not all.
> To reproduce this, just add a few jars (I tested 5) to the --jars option, and 
> then try to import a class from the last jar.  This fails.  A simple 
> reproducer: 
> Create a bunch of dummy jars:
> jar cfM jar1.jar log.txt
> jar cfM jar2.jar log.txt
> jar cfM jar3.jar log.txt
> jar cfM jar4.jar log.txt
> Start the spark-shell with the dummy jars and guava at the end:
> %SPARK_HOME%\bin\spark-shell --master local --jars 
> jar1.jar,jar2.jar,jar3.jar,jar4.jar,c:\code\lib\guava-14.0.1.jar
> In the shell, try importing from guava; you'll get an error:
> {code}
> scala> import com.google.common.base.Strings
> :19: error: object Strings is not a member of package 
> com.google.common.base
>import com.google.common.base.Strings
>   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6435) spark-shell --jars option does not add all jars to classpath

2015-03-27 Thread vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383566#comment-14383566
 ] 

vijay edited comment on SPARK-6435 at 3/27/15 9:17 AM:
---

Strange - when I test it with multiple jars (with the fixed script) everything 
works.
Something has changed in some other script wrt the released 1.3.0


was (Author: vjapache):
Strange - when I test it with multiple jars (with the fixed script) everything 
works

> spark-shell --jars option does not add all jars to classpath
> 
>
> Key: SPARK-6435
> URL: https://issues.apache.org/jira/browse/SPARK-6435
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Windows
>Affects Versions: 1.3.0
> Environment: Win64
>Reporter: vijay
>
> Not all jars supplied via the --jars option will be added to the driver (and 
> presumably executor) classpath.  The first jar(s) will be added, but not all.
> To reproduce this, just add a few jars (I tested 5) to the --jars option, and 
> then try to import a class from the last jar.  This fails.  A simple 
> reproducer: 
> Create a bunch of dummy jars:
> jar cfM jar1.jar log.txt
> jar cfM jar2.jar log.txt
> jar cfM jar3.jar log.txt
> jar cfM jar4.jar log.txt
> Start the spark-shell with the dummy jars and guava at the end:
> %SPARK_HOME%\bin\spark-shell --master local --jars 
> jar1.jar,jar2.jar,jar3.jar,jar4.jar,c:\code\lib\guava-14.0.1.jar
> In the shell, try importing from guava; you'll get an error:
> {code}
> scala> import com.google.common.base.Strings
> :19: error: object Strings is not a member of package 
> com.google.common.base
>import com.google.common.base.Strings
>   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop

2015-01-29 Thread vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296775#comment-14296775
 ] 

vijay commented on SPARK-2356:
--

This is how I worked around this in Windows:
* Download and extract 
https://codeload.github.com/srccodes/hadoop-common-2.2.0-bin/zip/master
* Modify bin\spark-class2.cmd and add the hadoop.home.dir system property:
{code}
if not [%SPARK_SUBMIT_BOOTSTRAP_DRIVER%] == [] (
  set SPARK_CLASS=1
  "%RUNNER%" -Dhadoop.home.dir=C:\code\hadoop-common-2.2.0-bin-master 
org.apache.spark.deploy.SparkSubmitDriverBootstrapper %BOOTSTRAP_ARGS%
) else (
  "%RUNNER%" -Dhadoop.home.dir=C:\code\hadoop-common-2.2.0-bin-master -cp 
"%CLASSPATH%" %JAVA_OPTS% %*
)
{code}

That being said, this is a workaround for what I consider a critical bug (if 
spark indeed is meant to support windows).


> Exception: Could not locate executable null\bin\winutils.exe in the Hadoop 
> ---
>
> Key: SPARK-2356
> URL: https://issues.apache.org/jira/browse/SPARK-2356
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Kostiantyn Kudriavtsev
>Priority: Critical
>
> I'm trying to run some transformation on Spark, it works fine on cluster 
> (YARN, linux machines). However, when I'm trying to run it on local machine 
> (Windows 7) under unit test, I got errors (I don't use Hadoop, I'm read file 
> from local filesystem):
> {code}
> 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
> hadoop binary path
> java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
> Hadoop binaries.
>   at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
>   at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
>   at org.apache.hadoop.util.Shell.(Shell.java:326)
>   at org.apache.hadoop.util.StringUtils.(StringUtils.java:76)
>   at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
>   at org.apache.hadoop.security.Groups.(Groups.java:77)
>   at 
> org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
>   at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
>   at 
> org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.(SparkHadoopUtil.scala:36)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala:109)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala)
>   at org.apache.spark.SparkContext.(SparkContext.scala:228)
>   at org.apache.spark.SparkContext.(SparkContext.scala:97)
> {code}
> It's happened because Hadoop config is initialized each time when spark 
> context is created regardless is hadoop required or not.
> I propose to add some special flag to indicate if hadoop config is required 
> (or start this configuration manually)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5481) JdbcRDD requires JDBC 4 APIs, limiting compatible JDBC Drivers

2015-01-29 Thread vijay (JIRA)
vijay created SPARK-5481:


 Summary: JdbcRDD requires JDBC 4 APIs, limiting compatible JDBC 
Drivers
 Key: SPARK-5481
 URL: https://issues.apache.org/jira/browse/SPARK-5481
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: vijay


JdbcRDD makes unnecessary use of JDBC 4 APIs.  To maintain broad jdbc driver 
support, Spark should support JDBC 3.

The issue is calling isClosed() prior to closing JDBC object.  isClosed() is 
part of JDBC 4.  It is perfectly safe to close something that is already closed 
- this may throw an exception (which is caught) but has no negative side 
affects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5481) JdbcRDD requires JDBC 4 APIs, limiting compatible JDBC Drivers

2015-01-29 Thread vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296789#comment-14296789
 ] 

vijay commented on SPARK-5481:
--

JDBC 4 is an API.  Drivers implement the API, or parts thereof.  You can use 
JDBC 3 compliant drivers in Java 6; calls to the JDBC 4 functions against such 
drivers cause java.lang.AbstractMethodError exceptions.  Spark isn't doing 
anything fancy that requires any of the JDBC 4 features; the only function 
AFAICT is isClosed(), which as mentioned above is superfluous.

> JdbcRDD requires JDBC 4 APIs, limiting compatible JDBC Drivers
> --
>
> Key: SPARK-5481
> URL: https://issues.apache.org/jira/browse/SPARK-5481
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: vijay
>
> JdbcRDD makes unnecessary use of JDBC 4 APIs.  To maintain broad jdbc driver 
> support, Spark should support JDBC 3.
> The issue is calling isClosed() prior to closing JDBC object.  isClosed() is 
> part of JDBC 4.  It is perfectly safe to close something that is already 
> closed - this may throw an exception (which is caught) but has no negative 
> side affects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5481) JdbcRDD requires JDBC 4 APIs, limiting compatible JDBC Drivers

2015-01-29 Thread vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296838#comment-14296838
 ] 

vijay commented on SPARK-5481:
--

Legacy databases that have tons of data and are still in use; e.g. DB2 v 9.1 or 
lower: http://www-01.ibm.com/support/docview.wss?uid=swg21363866


> JdbcRDD requires JDBC 4 APIs, limiting compatible JDBC Drivers
> --
>
> Key: SPARK-5481
> URL: https://issues.apache.org/jira/browse/SPARK-5481
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: vijay
>
> JdbcRDD makes unnecessary use of JDBC 4 APIs.  To maintain broad jdbc driver 
> support, Spark should support JDBC 3.
> The issue is calling isClosed() prior to closing JDBC object.  isClosed() is 
> part of JDBC 4.  It is perfectly safe to close something that is already 
> closed - this may throw an exception (which is caught) but has no negative 
> side affects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark

2018-08-16 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583290#comment-16583290
 ] 

Vijay commented on SPARK-6305:
--

Hello,

I have a question and need help.

I am using Spark 2 version.

My spark submit application has Log4J2 jars shaded as part of the build. The 
log4j.xml is placed in resources folder.

Can i create logs created using Log4J 2 API in a new file?

Could you please tell me what i need to do to make it work.

Thanks

 

 

 

 

> Add support for log4j 2.x to Spark
> --
>
> Key: SPARK-6305
> URL: https://issues.apache.org/jira/browse/SPARK-6305
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Tal Sliwowicz
>Priority: Minor
>
> log4j 2 requires replacing the slf4j binding and adding the log4j jars in the 
> classpath. Since there are shaded jars, it must be done during the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org