Re: Error in spark-xml

2016-05-01 Thread Mail.com
Can you try once by creating your own schema file and using it to read the XML.

I had similar issue but got that resolved by custom schema and by specifying 
each attribute in that.

Pradeep


> On May 1, 2016, at 9:45 AM, Hyukjin Kwon  wrote:
> 
> To be more clear,
> 
> If you set the rowTag as "book", then it will produces an exception which is 
> an issue opened here, https://github.com/databricks/spark-xml/issues/92
> 
> Currently it does not support to parse a single element with only a value as 
> a row.
> 
> 
> If you set the rowTag as "bkval", then it should work. I tested the case 
> below to double check.
> 
> If it does not work as below, please open an issue with some information so 
> that I can reproduce.
> 
> 
> I tested the case above with the data below
> 
>   
> bk_113
> bk_114
>   
>   
> bk_114
> bk_116
>   
>   
> bk_115
> bk_116
>   
> 
> 
> 
> I tested this with the codes below
> 
> val path = "path-to-file"
> sqlContext.read
>   .format("xml")
>   .option("rowTag", "bkval")
>   .load(path)
>   .show()
> 
> Thanks!
> 
> 
> 2016-05-01 15:11 GMT+09:00 Hyukjin Kwon :
>> Hi Sourav,
>> 
>> I think it is an issue. XML will assume the element by the rowTag as object.
>> 
>>  Could you please open an issue in 
>> https://github.com/databricks/spark-xml/issues please?
>> 
>> Thanks!
>> 
>> 
>> 2016-05-01 5:08 GMT+09:00 Sourav Mazumder :
>>> Hi,
>>> 
>>> Looks like there is a problem in spark-xml if the xml has multiple 
>>> attributes with no child element.
>>> 
>>> For example say the xml has a nested object as below 
>>> 
>>> bk_113
>>> bk_114
>>>  
>>> 
>>> Now if I create a dataframe starting with rowtag bkval and then I do a 
>>> select on that data frame it gives following error.
>>> 
>>> 
>>> scala.MatchError: ENDDOCUMENT (of class 
>>> com.sun.xml.internal.stream.events.EndDocumentEvent) at 
>>> com.databricks.spark.xml.parsers.StaxXmlParser$.checkEndElement(StaxXmlParser.scala:94)
>>>  at  
>>> com.databricks.spark.xml.parsers.StaxXmlParser$.com$databricks$spark$xml$parsers$StaxXmlParser$$convertObject(StaxXmlParser.scala:295)
>>>  at 
>>> com.databricks.spark.xml.parsers.StaxXmlParser$$anonfun$parse$1$$anonfun$apply$4.apply(StaxXmlParser.scala:58)
>>>  at 
>>> com.databricks.spark.xml.parsers.StaxXmlParser$$anonfun$parse$1$$anonfun$apply$4.apply(StaxXmlParser.scala:46)
>>>  at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at 
>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at 
>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at 
>>> scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308) at 
>>> scala.collection.Iterator$class.foreach(Iterator.scala:727) at 
>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at 
>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at 
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 
>>> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 
>>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at 
>>> scala.collection.AbstractIterator.to(Iterator.scala:1157) at 
>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) 
>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at 
>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 
>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at 
>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
>>>  at 
>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
>>>  at 
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>>>  at 
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>>>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at 
>>> org.apache.spark.scheduler.Task.run(Task.scala:88) at 
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>  at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>  at java.lang.Thread.run(Thread.java:745)
>>> 
>>> However if there is only one row like below, it works fine.
>>> 
>>> 
>>> bk_113
>>> 
>>> 
>>> Any workaround ?
>>> 
>>> Regards,
>>> Sourav
> 


?????? Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Couldnottransfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2):repo1.maven.org:unk

2016-05-01 Thread sunday2000
hi, by stopping Zinc server, got this error message:
[INFO] Spark Project External Kafka Assembly .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 8.649 s
[INFO] Finished at: 2016-05-02T14:19:21+08:00
[INFO] Final Memory: 38M/213M
[INFO] 
[ERROR] Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on 
project spark-test-tags_2.10: Execution scala-compile-first of goal 
net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed. CompileFailed -> 
[Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on 
project spark-test-tags_2.10: Execution scala-compile-first of goal 
net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed.
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:224)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.PluginExecutionException: Execution 
scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile 
failed.
at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:145)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
... 20 more
Caused by: javac returned nonzero exit code
at sbt.compiler.JavaCompiler$JavaTool0.compile(JavaCompiler.scala:77)
at sbt.compiler.JavaTool$class.apply(JavaCompiler.scala:35)
at sbt.compiler.JavaCompiler$JavaTool0.apply(JavaCompiler.scala:63)
at sbt.compiler.JavaCompiler$class.compile(JavaCompiler.scala:21)
at sbt.compiler.JavaCompiler$JavaTool0.compile(JavaCompiler.scala:63)
at 
sbt.compiler.AggressiveCompile$$anonfun$3$$anonfun$compileJava$1$1.apply$mcV$sp(AggressiveCompile.scala:127)
at 
sbt.compiler.AggressiveCompile$$anonfun$3$$anonfun$compileJava$1$1.apply(AggressiveCompile.scala:127)
at 
sbt.compiler.AggressiveCompile$$anonfun$3$$anonfun$compileJava$1$1.apply(AggressiveCompile.scala:127)
at 
sbt.compiler.AggressiveCompile.sbt$compiler$AggressiveCompile$$timed(AggressiveCompile.scala:166)
at 
sbt.compiler.AggressiveCompile$$anonfun$3.compileJava$1(AggressiveCompile.scala:126)
at 
sbt.compiler.AggressiveCompile$$anonfun$3.apply(AggressiveCompile.scala:143)
at 
sbt.compiler.AggressiveCompile$$anonfun$3.apply(AggressiveCompile.scala:87)
at 
sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:39)
at 
sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:37)
at sbt.inc.IncrementalCommon.cycle(Incremental.scala:99)
at sbt.inc.Incremental$$anonfun$1.apply(Incremental.scala:38)
at sbt.inc.Incremental$$anonfun$1.apply(Incremental.scala:37)
at sbt.inc.Incremental$.manageClassfiles(Incremental.scala:65)
at sbt.inc.Incremental$.compile(Incremental.scala:37)
at sbt.inc.IncrementalCompile$.apply(Compile.scala:27)
at s

SparkSQL with large result size

2016-05-01 Thread Buntu Dev
I got a 10g limitation on the executors and operating on parquet dataset
with block size 70M with 200 blocks. I keep hitting the memory limits when
doing a 'select * from t1 order by c1 limit 100' (ie, 1M). It works if
I limit to say 100k. What are the options to save a large dataset without
running into memory issues?

Thanks!


using amazon STS with spark

2016-05-01 Thread Luke Rohde
Hi - I'm using s3 storage with spark and would like to use AWS credentials
provided by STS to authenticate. I'm doing the following to use those
credentials:

val hadoopConf = sc.hadoopConfiguration
hadoopConf.set("fs.s3.awsAccessKeyId",credentials.getAccessKeyId)
hadoopConf.set("fs.s3.awsSecretAccessKey",credentials.getSecretAccessKey)

This is fine, but the credentials provided by the assume role API call are
temporary and expire after a maximum 1 hour lifetime. Does anyone have any
suggestions for jobs that would extend beyond a 1-hour duration and thus
would require resetting that credential config?

In general: does modifying a value in the SparkConf on the driver propagate
to executors after the executors start? I could imagine having a background
thread on the driver periodically refresh the credentials if so.

Thanks in advance.


Re: Spark on AWS

2016-05-01 Thread Teng Qiu
Hi, here we made several optimizations for accessing s3 from spark:
https://github.com/apache/spark/compare/branch-1.6...zalando:branch-1.6-zalando

such as:
https://github.com/apache/spark/compare/branch-1.6...zalando:branch-1.6-zalando#diff-d579db9a8f27e0bbef37720ab14ec3f6R133

you can deploy our spark package using our docker image, just simply:

docker run -d --net=host \
   -e START_MASTER="true" \
   -e START_WORKER="true" \
   -e START_WEBAPP="true" \
   -e START_NOTEBOOK="true" \
   registry.opensource.zalan.do/bi/spark:1.6.2-6


a jupyter notebook will running on port 


have fun

Best,

Teng

2016-04-29 12:37 GMT+02:00 Steve Loughran :
>
> On 28 Apr 2016, at 22:59, Alexander Pivovarov  wrote:
>
> Spark works well with S3 (read and write). However it's recommended to set
> spark.speculation true (it's expected that some tasks fail if you read large
> S3 folder, so speculation should help)
>
>
>
> I must disagree.
>
> Speculative execution has >1 executor running the query, with whoever
> finishes first winning.
> however, "finishes first" is implemented in the output committer, by
> renaming the attempt's output directory to the final output directory:
> whoever renames first wins.
> This relies on rename() being implemented in the filesystem client as an
> atomic transaction.
> Unfortunately, S3 doesn't do renames. Instead every file gets copied to one
> of the new name, then the old file deleted; an operation that takes time
> O(data * files)
>
> if you have more than one executor trying to commit the work simultaneously,
> your output will be mess of both executions, without anything detecting and
> reporting it.
>
> Where did you find this recommendation to set speculation=true?
>
> -Steve
>
> see also: https://issues.apache.org/jira/browse/SPARK-10063

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Is DataFrame randomSplit Deterministic?

2016-05-01 Thread Brandon White
If I have the same data, the same ratios, and same sample seed, will I get
the same splits every time?


Re: Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Couldnot transfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2):repo1.maven.org: unkn

2016-05-01 Thread Ted Yu
bq. Caused by: Compile failed via zinc server

Looks like Zinc got in the way of compilation.

Consider stopping Zinc and do a clean build.

On Sun, May 1, 2016 at 8:35 AM, sunday2000 <2314476...@qq.com> wrote:

> Error message:
> [debug] External API changes: API Changes: Set()
> [debug] Modified binary dependencies: Set()
> [debug] Initial directly invalidated sources:
> Set(/root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/DockerTest.java,
> /root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/ExtendedHiveTest.java,
> /root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/ExtendedYarnTest.java)
> [debug]
> [debug] Sources indirectly invalidated by:
> [debug] product: Set()
> [debug] binary dep: Set()
> [debug] external source: Set()
> [debug] All initially invalidated sources:
> Set(/root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/DockerTest.java,
> /root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/ExtendedHiveTest.java,
> /root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/ExtendedYarnTest.java)
> [debug] Recompiling all 3 sources: invalidated sources (3) exceeded 50.0%
> of all sources
> [info] Compiling 3 Java sources to
> /root/dl/spark-1.6.1/tags/target/scala-2.10/classes...
> [debug] Attempting to call javac directly...
> [debug] com.sun.tools.javac.Main not found with appropriate method
> signature; forking javac instead
> [debug] Forking javac: javac @/tmp/sbt_e1764914/argfile
> [error] javac: invalid source release: 1.7
> [error] Usage: javac  
> [error] use -help for a list of possible options
> [debug] javac returned exit code: 2
> [error] Compile failed at May 1, 2016 11:33:06 PM [0.420s]
> [INFO]
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Spark Project Parent POM ... SUCCESS [
>  6.542 s]
> [INFO] Spark Project Test Tags  FAILURE [
>  1.412 s]
> [INFO] Spark Project Launcher . SKIPPED
> [INFO] Spark Project Networking ... SKIPPED
> [INFO] Spark Project Shuffle Streaming Service  SKIPPED
> [INFO] Spark Project Unsafe ... SKIPPED
> [INFO] Spark Project Core . SKIPPED
> [INFO] Spark Project Bagel  SKIPPED
> [INFO] Spark Project GraphX ... SKIPPED
> [INFO] Spark Project Streaming  SKIPPED
> [INFO] Spark Project Catalyst . SKIPPED
> [INFO] Spark Project SQL .. SKIPPED
> [INFO] Spark Project ML Library ... SKIPPED
> [INFO] Spark Project Tools  SKIPPED
> [INFO] Spark Project Hive . SKIPPED
> [INFO] Spark Project Docker Integration Tests . SKIPPED
> [INFO] Spark Project REPL . SKIPPED
> [INFO] Spark Project Assembly . SKIPPED
> [INFO] Spark Project External Twitter . SKIPPED
> [INFO] Spark Project External Flume Sink .. SKIPPED
> [INFO] Spark Project External Flume ... SKIPPED
> [INFO] Spark Project External Flume Assembly .. SKIPPED
> [INFO] Spark Project External MQTT  SKIPPED
> [INFO] Spark Project External MQTT Assembly ... SKIPPED
> [INFO] Spark Project External ZeroMQ .. SKIPPED
> [INFO] Spark Project External Kafka ... SKIPPED
> [INFO] Spark Project Examples . SKIPPED
> [INFO] Spark Project External Kafka Assembly .. SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 9.995 s
> [INFO] Finished at: 2016-05-01T23:33:06+08:00
> [INFO] Final Memory: 36M/202M
> [INFO]
> 
> [ERROR] Failed to execute goal
> net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first)
> on project spark-test-tags_2.10: Execution scala-compile-first of goal
> net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed. CompileFailed
> -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
> goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile
> (scala-compile-first) on project spark-test-tags_2.10: Execution
> scala-compile-first of goal
> net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed.
> at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:224)
> at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
> at
> org.apache.maven.li

?????? Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Couldnot transfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2):repo1.maven.org: u

2016-05-01 Thread sunday2000
Error message:
[debug] External API changes: API Changes: Set()
[debug] Modified binary dependencies: Set()
[debug] Initial directly invalidated sources: 
Set(/root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/DockerTest.java,
 
/root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/ExtendedHiveTest.java,
 
/root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/ExtendedYarnTest.java)
[debug] 
[debug] Sources indirectly invalidated by:
[debug] product: Set()
[debug] binary dep: Set()
[debug] external source: Set()
[debug] All initially invalidated sources: 
Set(/root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/DockerTest.java,
 
/root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/ExtendedHiveTest.java,
 
/root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/ExtendedYarnTest.java)
[debug] Recompiling all 3 sources: invalidated sources (3) exceeded 50.0% of 
all sources
[info] Compiling 3 Java sources to 
/root/dl/spark-1.6.1/tags/target/scala-2.10/classes...
[debug] Attempting to call javac directly...
[debug] com.sun.tools.javac.Main not found with appropriate method signature; 
forking javac instead
[debug] Forking javac: javac @/tmp/sbt_e1764914/argfile
[error] javac: invalid source release: 1.7
[error] Usage: javac  
[error] use -help for a list of possible options
[debug] javac returned exit code: 2
[error] Compile failed at May 1, 2016 11:33:06 PM [0.420s]
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ... SUCCESS [  6.542 s]
[INFO] Spark Project Test Tags  FAILURE [  1.412 s]
[INFO] Spark Project Launcher . SKIPPED
[INFO] Spark Project Networking ... SKIPPED
[INFO] Spark Project Shuffle Streaming Service  SKIPPED
[INFO] Spark Project Unsafe ... SKIPPED
[INFO] Spark Project Core . SKIPPED
[INFO] Spark Project Bagel  SKIPPED
[INFO] Spark Project GraphX ... SKIPPED
[INFO] Spark Project Streaming  SKIPPED
[INFO] Spark Project Catalyst . SKIPPED
[INFO] Spark Project SQL .. SKIPPED
[INFO] Spark Project ML Library ... SKIPPED
[INFO] Spark Project Tools  SKIPPED
[INFO] Spark Project Hive . SKIPPED
[INFO] Spark Project Docker Integration Tests . SKIPPED
[INFO] Spark Project REPL . SKIPPED
[INFO] Spark Project Assembly . SKIPPED
[INFO] Spark Project External Twitter . SKIPPED
[INFO] Spark Project External Flume Sink .. SKIPPED
[INFO] Spark Project External Flume ... SKIPPED
[INFO] Spark Project External Flume Assembly .. SKIPPED
[INFO] Spark Project External MQTT  SKIPPED
[INFO] Spark Project External MQTT Assembly ... SKIPPED
[INFO] Spark Project External ZeroMQ .. SKIPPED
[INFO] Spark Project External Kafka ... SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] Spark Project External Kafka Assembly .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 9.995 s
[INFO] Finished at: 2016-05-01T23:33:06+08:00
[INFO] Final Memory: 36M/202M
[INFO] 
[ERROR] Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on 
project spark-test-tags_2.10: Execution scala-compile-first of goal 
net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed. CompileFailed -> 
[Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on 
project spark-test-tags_2.10: Execution scala-compile-first of goal 
net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed.
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:224)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.

Re: Can not import KafkaProducer in spark streaming job

2016-05-01 Thread أنس الليثي
from the bin directory of spark

*./spark-submit --master spark://localhost:7077
/home/anas/spark/historical_streamer.py*



On 1 May 2016 at 16:30, Ted Yu  wrote:

> According to
> examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala
> :
>
> import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig,
> ProducerRecord}
>
> Can you give the command line you used to submit the job ?
>
> Probably classpath issue.
>
> On Sun, May 1, 2016 at 5:11 AM, fanooos  wrote:
>
>> I have a very strange problem.
>>
>> I wrote a spark streaming job that monitor an HDFS directory, read the
>> newly
>> added files, and send the contents to Kafka.
>>
>> The job is written in python and you can got the code from this link
>>
>> http://pastebin.com/mpKkMkph
>>
>> When submitting the job I got that error
>>
>> *ImportError: cannot import name KafkaProducer*
>>
>> As you see, the error is very simple but the problem is that I could
>> import
>> the KafkaProducer from both python and pyspark shells without any problem.
>>
>> I tried to reboot the machine but the situation remain the same.
>>
>> What do you think the problem is?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Can-not-import-KafkaProducer-in-spark-streaming-job-tp26857.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


-- 
Anas Rabei
Senior Software Developer
Mubasher.info
anas.ra...@mubasher.info


?????? Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Couldnot transfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2):repo1.maven.org: u

2016-05-01 Thread sunday2000
Downloading: 
https://repository.apache.org/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.jboss.org/nexus/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repo.eclipse.org/content/repositories/paho-releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/apache/14/apache-14.pom
Downloading: 
https://oss.sonatype.org/content/repositories/orgspark-project-1113/org/apache/apache/14/apache-14.pom
Downloading: http://repository.mapr.com/maven/org/apache/apache/14/apache-14.pom
Downloading: http://maven.twttr.com/org/apache/apache/14/apache-14.pom
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[FATAL] Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1: 
Could not transfer artifact org.apache:apache:pom:14 from/to central 
(https://repo1.maven.org/maven2): Connect to repo1.maven.org:443 
[repo1.maven.org/23.235.47.209] failed: Connection timed out and 
'parent.relativePath' points at wrong local POM @ line 22, column 11
 @ 
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]   
[ERROR]   The project org.apache.spark:spark-parent_2.10:1.6.1 
(/data/spark/spark-1.6.1/pom.xml) has 1 error
[ERROR] Non-resolvable parent POM for 
org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact 
org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2): 
Connect to repo1.maven.org:443 [repo1.maven.org/23.235.47.209] failed: 
Connection timed out and 'parent.relativePath' points at wrong local POM @ line 
22, column 11 -> [Help 2]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException





--  --
??: "Ted Yu";;
: 2016??5??1??(??) 9:50
??: "sunday2000"<2314476...@qq.com>; 
: "user"; 
: Re: Why Non-resolvable parent POM for 
org.apache.spark:spark-parent_2.10:1.6.1:Couldnot transfer artifact 
org.apache:apache:pom:14 from/to 
central(https://repo1.maven.org/maven2):repo1.maven.org: unknown error



FYI

Accessing the link below gave me 'Page does not exist'


I am in California.


I checked the dependency tree of 1.6.1 - I didn't see such dependence.


Can you pastebin related maven output ?


Thanks


On Sun, May 1, 2016 at 6:32 AM, sunday2000 <2314476...@qq.com> wrote:
Seems it is because fail to download this url:
http://maven.twttr.com/org/apache/apache/14/apache-14.pom




--  --
??: "Ted Yu";;
: 2016??5??1??(??) 9:27
??: "sunday2000"<2314476...@qq.com>; 
: "user"; 
: Re: Why Non-resolvable parent POM for 
org.apache.spark:spark-parent_2.10:1.6.1:Could not transfer artifact 
org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2): 
repo1.maven.org: unknown error



bq. Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1

Looks like you were using Spark 1.6.1


Can you check firewall settings ?


I saw similar report from Chinese users.


Consider using proxy.


On Sun, May 1, 2016 at 4:19 AM, sunday2000 <2314476...@qq.com> wrote:
Hi,
  We are compiling spare 1.6.0 in a linux server, while getting this error 
message. Could you tell us how to solve it? thanks.


[INFO] Scanning for projects...
Downloading: https://repo1.maven.org/maven2/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.apache.org/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.jboss.org/nexus/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repo.eclipse.org/content/repositories/paho-releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/apache/14/apache-14.pom
Downloading: 
https://oss.sonatype.org/content/repositories/orgspark-project-1113/org/apache/apache/14/apache-14.pom
Downloading: http://repository.mapr.com/maven/org/apache/apache/14/apache-14.pom
Downloading: http://maven.twttr.com/org/apache/apache/14/apache-14.pom
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[FATAL] Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1: 
Could not transfer artifact org.apache:apache:pom:14 from/to central 
(https://repo1.maven.org/maven2): repo1.maven.org: unknown error and 
'parent.relativePath' points at wrong local POM @ line 22, column 11
 @ 
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]   
[ERROR]

Re: Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Could not transfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2): repo1.maven.org: un

2016-05-01 Thread Ted Yu
FYI

Accessing the link below gave me 'Page does not exist'

I am in California.

I checked the dependency tree of 1.6.1 - I didn't see such dependence.

Can you pastebin related maven output ?

Thanks

On Sun, May 1, 2016 at 6:32 AM, sunday2000 <2314476...@qq.com> wrote:

> Seems it is because fail to download this url:
> http://maven.twttr.com/org/apache/apache/14/apache-14.pom
>
>
> -- 原始邮件 --
> *发件人:* "Ted Yu";;
> *发送时间:* 2016年5月1日(星期天) 晚上9:27
> *收件人:* "sunday2000"<2314476...@qq.com>;
> *抄送:* "user";
> *主题:* Re: Why Non-resolvable parent POM for
> org.apache.spark:spark-parent_2.10:1.6.1:Could not transfer artifact
> org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2):
> repo1.maven.org: unknown error
>
> bq. Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1
>
> Looks like you were using Spark 1.6.1
>
> Can you check firewall settings ?
>
> I saw similar report from Chinese users.
>
> Consider using proxy.
>
> On Sun, May 1, 2016 at 4:19 AM, sunday2000 <2314476...@qq.com> wrote:
>
>> Hi,
>>   We are compiling spare 1.6.0 in a linux server, while getting this
>> error message. Could you tell us how to solve it? thanks.
>>
>> [INFO] Scanning for projects...
>> Downloading:
>> https://repo1.maven.org/maven2/org/apache/apache/14/apache-14.pom
>> Downloading:
>> https://repository.apache.org/content/repositories/releases/org/apache/apache/14/apache-14.pom
>> Downloading:
>> https://repository.jboss.org/nexus/content/repositories/releases/org/apache/apache/14/apache-14.pom
>> Downloading:
>> https://repo.eclipse.org/content/repositories/paho-releases/org/apache/apache/14/apache-14.pom
>> Downloading:
>> https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/apache/14/apache-14.pom
>> Downloading:
>> https://oss.sonatype.org/content/repositories/orgspark-project-1113/org/apache/apache/14/apache-14.pom
>> Downloading:
>> http://repository.mapr.com/maven/org/apache/apache/14/apache-14.pom
>> Downloading: http://maven.twttr.com/org/apache/apache/14/apache-14.pom
>> [ERROR] [ERROR] Some problems were encountered while processing the POMs:
>> [FATAL] Non-resolvable parent POM for
>> org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact
>> org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2):
>> repo1.maven.org: unknown error and 'parent.relativePath' points at wrong
>> local POM @ line 22, column 11
>>  @
>> [ERROR] The build could not read 1 project -> [Help 1]
>> [ERROR]
>> [ERROR]   The project org.apache.spark:spark-parent_2.10:1.6.1
>> (/data/spark/spark-1.6.1/pom.xml) has 1 error
>> [ERROR] Non-resolvable parent POM for
>> org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact
>> org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2):
>> repo1.maven.org: unknown error and 'parent.relativePath' points at wrong
>> local POM @ line 22, column 11: Unknown host repo1.maven.org: unknown
>> error -> [Help 2]
>> [ERROR]
>> [ERROR] To see the full stack trace of the errors, re-run Maven with the
>> -e switch.
>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>> [ERROR]
>> [ERROR] For more information about the errors and possible solutions,
>> please read the following articles:
>> [ERROR] [Help 1]
>> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
>> [ERROR] [Help 2]
>> http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException
>>
>
>


Re: Error in spark-xml

2016-05-01 Thread Hyukjin Kwon
To be more clear,

If you set the rowTag as "book", then it will produces an exception which
is an issue opened here, https://github.com/databricks/spark-xml/issues/92

Currently it does not support to parse a single element with only a value
as a row.


If you set the rowTag as "bkval", then it should work. I tested the case
below to double check.

If it does not work as below, please open an issue with some information so
that I can reproduce.


I tested the case above with the data below


  
bk_113
bk_114
  
  
bk_114
bk_116
  
  
bk_115
bk_116
  



I tested this with the codes below

val path = "path-to-file"
sqlContext.read
  .format("xml")
  .option("rowTag", "bkval")
  .load(path)
  .show()

​

Thanks!


2016-05-01 15:11 GMT+09:00 Hyukjin Kwon :

> Hi Sourav,
>
> I think it is an issue. XML will assume the element by the rowTag as
> object.
>
>  Could you please open an issue in
> https://github.com/databricks/spark-xml/issues please?
>
> Thanks!
>
>
> 2016-05-01 5:08 GMT+09:00 Sourav Mazumder :
>
>> Hi,
>>
>> Looks like there is a problem in spark-xml if the xml has multiple
>> attributes with no child element.
>>
>> For example say the xml has a nested object as below
>> 
>> bk_113
>> bk_114
>>  
>>
>> Now if I create a dataframe starting with rowtag bkval and then I do a
>> select on that data frame it gives following error.
>>
>>
>> scala.MatchError: ENDDOCUMENT (of class
>> com.sun.xml.internal.stream.events.EndDocumentEvent) at
>> com.databricks.spark.xml.parsers.StaxXmlParser$.checkEndElement(StaxXmlParser.scala:94)
>> at
>> com.databricks.spark.xml.parsers.StaxXmlParser$.com$databricks$spark$xml$parsers$StaxXmlParser$$convertObject(StaxXmlParser.scala:295)
>> at
>> com.databricks.spark.xml.parsers.StaxXmlParser$$anonfun$parse$1$$anonfun$apply$4.apply(StaxXmlParser.scala:58)
>> at
>> com.databricks.spark.xml.parsers.StaxXmlParser$$anonfun$parse$1$$anonfun$apply$4.apply(StaxXmlParser.scala:46)
>> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at
>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at
>> scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308) at
>> scala.collection.Iterator$class.foreach(Iterator.scala:727) at
>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at
>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>> at scala.collection.AbstractIterator.to(Iterator.scala:1157) at
>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at
>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at
>> org.apache.spark.scheduler.Task.run(Task.scala:88) at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> However if there is only one row like below, it works fine.
>>
>> 
>> bk_113
>> 
>>
>> Any workaround ?
>>
>> Regards,
>> Sourav
>>
>>
>


?????? Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Could not transfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2): repo1.maven.org:

2016-05-01 Thread sunday2000
Seems it is because fail to download this url:
http://maven.twttr.com/org/apache/apache/14/apache-14.pom




--  --
??: "Ted Yu";;
: 2016??5??1??(??) 9:27
??: "sunday2000"<2314476...@qq.com>; 
: "user"; 
: Re: Why Non-resolvable parent POM for 
org.apache.spark:spark-parent_2.10:1.6.1:Could not transfer artifact 
org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2): 
repo1.maven.org: unknown error



bq. Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1

Looks like you were using Spark 1.6.1


Can you check firewall settings ?


I saw similar report from Chinese users.


Consider using proxy.


On Sun, May 1, 2016 at 4:19 AM, sunday2000 <2314476...@qq.com> wrote:
Hi,
  We are compiling spare 1.6.0 in a linux server, while getting this error 
message. Could you tell us how to solve it? thanks.


[INFO] Scanning for projects...
Downloading: https://repo1.maven.org/maven2/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.apache.org/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.jboss.org/nexus/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repo.eclipse.org/content/repositories/paho-releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/apache/14/apache-14.pom
Downloading: 
https://oss.sonatype.org/content/repositories/orgspark-project-1113/org/apache/apache/14/apache-14.pom
Downloading: http://repository.mapr.com/maven/org/apache/apache/14/apache-14.pom
Downloading: http://maven.twttr.com/org/apache/apache/14/apache-14.pom
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[FATAL] Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1: 
Could not transfer artifact org.apache:apache:pom:14 from/to central 
(https://repo1.maven.org/maven2): repo1.maven.org: unknown error and 
'parent.relativePath' points at wrong local POM @ line 22, column 11
 @ 
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]   
[ERROR]   The project org.apache.spark:spark-parent_2.10:1.6.1 
(/data/spark/spark-1.6.1/pom.xml) has 1 error
[ERROR] Non-resolvable parent POM for 
org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact 
org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2): 
repo1.maven.org: unknown error and 'parent.relativePath' points at wrong local 
POM @ line 22, column 11: Unknown host repo1.maven.org: unknown error -> [Help 
2]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException

Re: Can not import KafkaProducer in spark streaming job

2016-05-01 Thread Ted Yu
According to
examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala
:

import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig,
ProducerRecord}

Can you give the command line you used to submit the job ?

Probably classpath issue.

On Sun, May 1, 2016 at 5:11 AM, fanooos  wrote:

> I have a very strange problem.
>
> I wrote a spark streaming job that monitor an HDFS directory, read the
> newly
> added files, and send the contents to Kafka.
>
> The job is written in python and you can got the code from this link
>
> http://pastebin.com/mpKkMkph
>
> When submitting the job I got that error
>
> *ImportError: cannot import name KafkaProducer*
>
> As you see, the error is very simple but the problem is that I could import
> the KafkaProducer from both python and pyspark shells without any problem.
>
> I tried to reboot the machine but the situation remain the same.
>
> What do you think the problem is?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Can-not-import-KafkaProducer-in-spark-streaming-job-tp26857.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2): repo1.maven.org:

2016-05-01 Thread Ted Yu
bq. Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1

Looks like you were using Spark 1.6.1

Can you check firewall settings ?

I saw similar report from Chinese users.

Consider using proxy.

On Sun, May 1, 2016 at 4:19 AM, sunday2000 <2314476...@qq.com> wrote:

> Hi,
>   We are compiling spare 1.6.0 in a linux server, while getting this error
> message. Could you tell us how to solve it? thanks.
>
> [INFO] Scanning for projects...
> Downloading:
> https://repo1.maven.org/maven2/org/apache/apache/14/apache-14.pom
> Downloading:
> https://repository.apache.org/content/repositories/releases/org/apache/apache/14/apache-14.pom
> Downloading:
> https://repository.jboss.org/nexus/content/repositories/releases/org/apache/apache/14/apache-14.pom
> Downloading:
> https://repo.eclipse.org/content/repositories/paho-releases/org/apache/apache/14/apache-14.pom
> Downloading:
> https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/apache/14/apache-14.pom
> Downloading:
> https://oss.sonatype.org/content/repositories/orgspark-project-1113/org/apache/apache/14/apache-14.pom
> Downloading:
> http://repository.mapr.com/maven/org/apache/apache/14/apache-14.pom
> Downloading: http://maven.twttr.com/org/apache/apache/14/apache-14.pom
> [ERROR] [ERROR] Some problems were encountered while processing the POMs:
> [FATAL] Non-resolvable parent POM for
> org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact
> org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2):
> repo1.maven.org: unknown error and 'parent.relativePath' points at wrong
> local POM @ line 22, column 11
>  @
> [ERROR] The build could not read 1 project -> [Help 1]
> [ERROR]
> [ERROR]   The project org.apache.spark:spark-parent_2.10:1.6.1
> (/data/spark/spark-1.6.1/pom.xml) has 1 error
> [ERROR] Non-resolvable parent POM for
> org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact
> org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2):
> repo1.maven.org: unknown error and 'parent.relativePath' points at wrong
> local POM @ line 22, column 11: Unknown host repo1.maven.org: unknown
> error -> [Help 2]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> [ERROR] [Help 2]
> http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException
>


Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2): repo1.maven.org: unkn

2016-05-01 Thread sunday2000
Hi,
  We are compiling spare 1.6.0 in a linux server, while getting this error 
message. Could you tell us how to solve it? thanks.


[INFO] Scanning for projects...
Downloading: https://repo1.maven.org/maven2/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.apache.org/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.jboss.org/nexus/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repo.eclipse.org/content/repositories/paho-releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/apache/14/apache-14.pom
Downloading: 
https://oss.sonatype.org/content/repositories/orgspark-project-1113/org/apache/apache/14/apache-14.pom
Downloading: http://repository.mapr.com/maven/org/apache/apache/14/apache-14.pom
Downloading: http://maven.twttr.com/org/apache/apache/14/apache-14.pom
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[FATAL] Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1: 
Could not transfer artifact org.apache:apache:pom:14 from/to central 
(https://repo1.maven.org/maven2): repo1.maven.org: unknown error and 
'parent.relativePath' points at wrong local POM @ line 22, column 11
 @ 
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]   
[ERROR]   The project org.apache.spark:spark-parent_2.10:1.6.1 
(/data/spark/spark-1.6.1/pom.xml) has 1 error
[ERROR] Non-resolvable parent POM for 
org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact 
org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2): 
repo1.maven.org: unknown error and 'parent.relativePath' points at wrong local 
POM @ line 22, column 11: Unknown host repo1.maven.org: unknown error -> [Help 
2]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException

Can not import KafkaProducer in spark streaming job

2016-05-01 Thread fanooos
I have a very strange problem. 

I wrote a spark streaming job that monitor an HDFS directory, read the newly
added files, and send the contents to Kafka. 

The job is written in python and you can got the code from this link

http://pastebin.com/mpKkMkph

When submitting the job I got that error 

*ImportError: cannot import name KafkaProducer*

As you see, the error is very simple but the problem is that I could import
the KafkaProducer from both python and pyspark shells without any problem. 

I tried to reboot the machine but the situation remain the same.

What do you think the problem is?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-not-import-KafkaProducer-in-spark-streaming-job-tp26857.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2): repo1.maven.org: unkn

2016-05-01 Thread sunday2000
Hi,
  We are compiling spare 1.6.0 in a linux server, while getting this error 
message. Could you tell us how to solve it? thanks.


[INFO] Scanning for projects...
Downloading: https://repo1.maven.org/maven2/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.apache.org/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.jboss.org/nexus/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repo.eclipse.org/content/repositories/paho-releases/org/apache/apache/14/apache-14.pom
Downloading: 
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/apache/14/apache-14.pom
Downloading: 
https://oss.sonatype.org/content/repositories/orgspark-project-1113/org/apache/apache/14/apache-14.pom
Downloading: http://repository.mapr.com/maven/org/apache/apache/14/apache-14.pom
Downloading: http://maven.twttr.com/org/apache/apache/14/apache-14.pom
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[FATAL] Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1: 
Could not transfer artifact org.apache:apache:pom:14 from/to central 
(https://repo1.maven.org/maven2): repo1.maven.org: unknown error and 
'parent.relativePath' points at wrong local POM @ line 22, column 11
 @ 
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]   
[ERROR]   The project org.apache.spark:spark-parent_2.10:1.6.1 
(/data/spark/spark-1.6.1/pom.xml) has 1 error
[ERROR] Non-resolvable parent POM for 
org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact 
org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2): 
repo1.maven.org: unknown error and 'parent.relativePath' points at wrong local 
POM @ line 22, column 11: Unknown host repo1.maven.org: unknown error -> [Help 
2]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException

Spark 1.6.1 issue fetching data via JDBC in Spark-shell

2016-05-01 Thread Mich Talebzadeh
Hi,

This sounds like a problem introduced in spark-shell 1.6.1.

Objective:  Use JDBC connection in Spark shell to get data from RDBMS table
(in this case Oracle)

Results: JDBC connection is made OK but the collection fails with error

ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
2.0 (TID 11, rhes564): java.lang.IllegalStateException: Did not find
registered driver with class oracle.jdbc.OracleDriver

Details

*Spark 1.6.1*

1) Create a simple JDBC connection in Spark-shell where the Oracle jar file
is loaded as below

spark-shell --master spark://50.140.197.217:7077 --jars
/home/hduser/jars/ojdbc6.jar

scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
HiveContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@500bde5b
scala> var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12"
_ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12
scala> var _username : String = "sh"
_username: String = sh
scala> var _password : String = "x"
_password: String = sh
scala> val c = HiveContext.load("jdbc",
 | Map("url" -> _ORACLEserver,
 | "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID,
CHANNEL_DESC FROM sh.channels)",
 | "user" -> _username,
 | "password" -> _password))
warning: there were 1 deprecation warning(s); re-run with -deprecation for
details
c: org.apache.spark.sql.DataFrame = [CHANNEL_ID: string, CHANNEL_DESC:
string]

This works

scala> c.printSchema
root
 |-- CHANNEL_ID: string (nullable = true)
 |-- CHANNEL_DESC: string (nullable = false)

*This fails *


*scala> c.first*16/05/01 10:06:13 ERROR TaskSetManager: Task 0 in stage 2.0
failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
2.0 (TID 11, rhes564): java.lang.IllegalStateException: Did not  find
registered driver with class oracle.jdbc.OracleDriver

*In Spark 1.5.2 it works*

    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
  /_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
HiveContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@e87c4cf
scala> var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12"
_ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12
scala> var _username : String = "sh"
_username: String = sh
scala> var _password : String = "sh"
_password: String = sh
scala> val c = HiveContext.load("jdbc",
 | Map("url" -> _ORACLEserver,
 | "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID,
CHANNEL_DESC FROM sh.channels)",
 | "user" -> _username,
 | "password" -> _password))
warning: there were 1 deprecation warning(s); re-run with -deprecation for
details
c: org.apache.spark.sql.DataFrame = [CHANNEL_ID: string, CHANNEL_DESC:
string]
scala> c.printSchema
root
 |-- CHANNEL_ID: string (nullable = true)
 |-- CHANNEL_DESC: string (nullable = false)

*This works in Spark 1.5.2 but fails in Spark 1.6.1*


*scala> c.firstres1: org.apache.spark.sql.Row = [3,Direct Sales]*

The work-around for now is to use Masven or sbt to create a jar file and
use that with spark-submit for now which is not really ideal.


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


Re: Bit(N) on create Table with MSSQLServer

2016-05-01 Thread Mich Talebzadeh
Well if MSSQL cannot create that column then it is more like compatibility
between Spark and RDBMS.

What value that column has in MSSQL. Can you create table the table in
MSSQL database or map it in Spark to a valid column before opening JDBC
connection?

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 29 April 2016 at 16:16, Andrés Ivaldi  wrote:

> Hello, Spark is executing a create table sentence (using JDBC) to
> MSSQLServer with a mapping column type like ColName Bit(1) for boolean
> types, This create table cannot be executed on MSSQLServer.
>
> In class JdbcDialect the mapping for Boolean type is Bit(1), so the
> question is, this is a problem of spark or JDBC driver who is not mapping
> right?
>
> Anyway it´s possible to override that mapping in Spark?
>
> Regards
>
> --
> Ing. Ivaldi Andres
>