Hi I want to ask an issue I have faced while using Spark. I load dataframes
from parquet files. Some dataframes' parquet have lots of partitions, 10
million rows.
Running where id = x query on dataframe scans all partitions. When saving
to rdd object/parquet there is a partition column. The
,Utm_Campaign),
left)
When I do this I get error: too many arguments for method apply.
Thanks
Bipin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-Join-Conditions-in-dataframe-join-tp23606.html
Sent from the Apache Spark User List mailing list
writing wide tables.
Cheng
On 6/15/15 5:48 AM, Bipin Nag wrote:
HI Davies,
I have tried recent 1.4 and 1.5-snapshot to 1) open the parquet and save
it again or 2 apply schema to rdd and save dataframe as parquet but now I
get this error (right in the beginning
recently.
On Fri, Jun 12, 2015 at 5:38 AM, Bipin Nag bipin@gmail.com wrote:
Hi Cheng,
Yes, some rows contain unit instead of decimal values. I believe some
rows
from original source I had don't have any value i.e. it is null. And that
shows up as unit. How does the spark-sql or parquet
to change it
properly.
Thanks for helping out.
Bipin
On 12 June 2015 at 14:57, Cheng Lian lian.cs@gmail.com wrote:
On 6/10/15 8:53 PM, Bipin Nag wrote:
Hi Cheng,
I am using Spark 1.3.1 binary available for Hadoop 2.6. I am loading an
existing parquet file, then repartitioning
/+name);
For applying schema and saving the parquet:
val myschema = schemamap(name)
val myrdd =
sc.objectFile[Array[Object]](/home/bipin/rawdata/+name).map(x =
org.apache.spark.sql.Row(x:_*))
val actualdata = sqlContext.createDataFrame(myrdd, myschema)
actualdata.saveAsParquetFile
@gmail.com wrote:
I suspect that Bookings and Customerdetails both have a PolicyType field,
one is string and the other is an int.
Cheng
On 6/8/15 9:15 PM, Bipin Nag wrote:
Hi Jeetendra, Cheng
I am using following code for joining
val Bookings = sqlContext.load(/home/administrator
Hi,
When I try to save my data frame as a parquet file I get the following
error:
java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to
org.apache.spark.sql.types.Decimal
at
org.apache.spark.sql.parquet.RowWriteSupport.writePrimitive(ParquetTableSupport.scala:220)
Hi I get this error message when saving a table:
parquet.io.ParquetDecodingException: The requested schema is not compatible
with the file schema. incompatible types: optional binary PolicyType (UTF8)
!= optional int32 PolicyType
at
found a column with conflicting data types.
Cheng
On 6/8/15 5:29 PM, bipin wrote:
Hi I get this error message when saving a table:
parquet.io.ParquetDecodingException: The requested schema is not
compatible
with the file schema. incompatible types: optional binary PolicyType
(UTF8
/bipin/rawdata/+name)
But I get
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to
org.apache.spark.sql.Row
How to work around this. Is there a better way.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Create-dataframe-from-saved
:42 PM, bipin bipin@gmail.com wrote:
Hi, I have a ddf with schema (CustomerID, SupplierID, ProductID, Event,
CreatedOn), the first 3 are Long ints and event can only be 1,2,3 and
CreatedOn is a timestamp. How can I make a group triplet/doublet/singlet
out
of them such that I can infer
Hi, I have a ddf with schema (CustomerID, SupplierID, ProductID, Event,
CreatedOn), the first 3 are Long ints and event can only be 1,2,3 and
CreatedOn is a timestamp. How can I make a group triplet/doublet/singlet out
of them such that I can infer that Customer registered event from 1to 2 and
if
I was running the spark shell and sql with --jars option containing the paths
when I got my error. What is the correct way to add jars I am not sure. I
tried placing the jar inside the directory you said but still get the error.
I will give the code you posted a try. Thanks.
--
View this
I am running the queries from spark-sql. I don't think it can communicate
with thrift server. Can you tell how I should run the quries to make it
work.
--
View this message in context:
Looks a good option. BTW v3.0 is round the corner.
http://slick.typesafe.com/news/2015/04/02/slick-3.0.0-RC3-released.html
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Microsoft-SQL-jdbc-support-from-spark-sql-tp22399p22521.html
Sent from the
Hi I imported a table from mssql server with Sqoop 1.4.5 in parquet format.
But when I try to load it from Spark shell, it throws error like :
scala val df1 = sqlContext.load(/home/bipin/Customer2)
scala.collection.parallel.CompositeThrowable: Multiple exceptions thrown
during a parallel
. There was some thoughts - credit to Cheng Lian for this -
about making the JDBC data source extensible for third party support
possibly via slick.
On Mon, Apr 6, 2015 at 10:41 PM bipin bipin@gmail.com wrote:
Hi, I am trying to pull data from ms-sql server. I have tried using
Hi, I am trying to pull data from ms-sql server. I have tried using the
spark.sql.jdbc
CREATE TEMPORARY TABLE c
USING org.apache.spark.sql.jdbc
OPTIONS (
url jdbc:sqlserver://10.1.0.12:1433\;databaseName=dbname\;,
dbtable Customer
);
But it shows java.sql.SQLException: No suitable driver found
19 matches
Mail list logo