I got a 10g limitation on the executors and operating on parquet dataset
with block size 70M with 200 blocks. I keep hitting the memory limits when
doing a 'select * from t1 order by c1 limit 100' (ie, 1M). It works if
I limit to say 100k. What are the options to save a large dataset without
Hi - I'm using s3 storage with spark and would like to use AWS credentials
provided by STS to authenticate. I'm doing the following to use those
credentials:
val hadoopConf = sc.hadoopConfiguration
hadoopConf.set("fs.s3.awsAccessKeyId",credentials.getAccessKeyId)
Hi, here we made several optimizations for accessing s3 from spark:
https://github.com/apache/spark/compare/branch-1.6...zalando:branch-1.6-zalando
such as:
https://github.com/apache/spark/compare/branch-1.6...zalando:branch-1.6-zalando#diff-d579db9a8f27e0bbef37720ab14ec3f6R133
you can deploy
If I have the same data, the same ratios, and same sample seed, will I get
the same splits every time?
bq. Caused by: Compile failed via zinc server
Looks like Zinc got in the way of compilation.
Consider stopping Zinc and do a clean build.
On Sun, May 1, 2016 at 8:35 AM, sunday2000 <2314476...@qq.com> wrote:
> Error message:
> [debug] External API changes: API Changes: Set()
> [debug] Modified
Error message:
[debug] External API changes: API Changes: Set()
[debug] Modified binary dependencies: Set()
[debug] Initial directly invalidated sources:
Set(/root/dl/spark-1.6.1/tags/src/main/java/org/apache/spark/tags/DockerTest.java,
Downloading:
https://repository.apache.org/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading:
https://repository.jboss.org/nexus/content/repositories/releases/org/apache/apache/14/apache-14.pom
Downloading:
FYI
Accessing the link below gave me 'Page does not exist'
I am in California.
I checked the dependency tree of 1.6.1 - I didn't see such dependence.
Can you pastebin related maven output ?
Thanks
On Sun, May 1, 2016 at 6:32 AM, sunday2000 <2314476...@qq.com> wrote:
> Seems it is because
To be more clear,
If you set the rowTag as "book", then it will produces an exception which
is an issue opened here, https://github.com/databricks/spark-xml/issues/92
Currently it does not support to parse a single element with only a value
as a row.
If you set the rowTag as "bkval", then it
Seems it is because fail to download this url:
http://maven.twttr.com/org/apache/apache/14/apache-14.pom
-- --
??: "Ted Yu";;
: 2016??5??1??(??) 9:27
??: "sunday2000"<2314476...@qq.com>;
:
According to
examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala
:
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig,
ProducerRecord}
Can you give the command line you used to submit the job ?
Probably classpath issue.
On Sun, May 1, 2016 at
bq. Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1
Looks like you were using Spark 1.6.1
Can you check firewall settings ?
I saw similar report from Chinese users.
Consider using proxy.
On Sun, May 1, 2016 at 4:19 AM, sunday2000 <2314476...@qq.com> wrote:
> Hi,
> We
Hi,
We are compiling spare 1.6.0 in a linux server, while getting this error
message. Could you tell us how to solve it? thanks.
[INFO] Scanning for projects...
Downloading: https://repo1.maven.org/maven2/org/apache/apache/14/apache-14.pom
Downloading:
I have a very strange problem.
I wrote a spark streaming job that monitor an HDFS directory, read the newly
added files, and send the contents to Kafka.
The job is written in python and you can got the code from this link
http://pastebin.com/mpKkMkph
When submitting the job I got that error
Hi,
We are compiling spare 1.6.0 in a linux server, while getting this error
message. Could you tell us how to solve it? thanks.
[INFO] Scanning for projects...
Downloading: https://repo1.maven.org/maven2/org/apache/apache/14/apache-14.pom
Downloading:
Hi,
This sounds like a problem introduced in spark-shell 1.6.1.
Objective: Use JDBC connection in Spark shell to get data from RDBMS table
(in this case Oracle)
Results: JDBC connection is made OK but the collection fails with error
ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times;
Well if MSSQL cannot create that column then it is more like compatibility
between Spark and RDBMS.
What value that column has in MSSQL. Can you create table the table in
MSSQL database or map it in Spark to a valid column before opening JDBC
connection?
HTH
Dr Mich Talebzadeh
LinkedIn *
Hi Sourav,
I think it is an issue. XML will assume the element by the rowTag as object.
Could you please open an issue in
https://github.com/databricks/spark-xml/issues please?
Thanks!
2016-05-01 5:08 GMT+09:00 Sourav Mazumder :
> Hi,
>
> Looks like there is a
18 matches
Mail list logo