Hi everyone,
I´m trying to read a text file with UTF-16LE but I´m getting weird
characters like this:
�� W h e n
My code is this one:
sparkSession
.read
.format("text")
.option("charset", "UTF-16LE")
.load("textfile.txt")
I´m using Spark 2.3.1. Any idea to fix
Hi,
I have the following issue,
case class Item (c1: String, c2: String, c3: Option[BigDecimal])
import sparkSession.implicits._
val result = df.as[Item].groupByKey(_.c1).mapGroups((key, value) => { value
})
But I get the following error in compilation time:
Unable to find encoder for type
I´m trying to build an application where is necessary to do bulkGets and
bulkLoad on Hbase.
I think that I could use this component
https://github.com/hortonworks-spark/shc
*Is it a good option??*
But* I can't import it in my project*. Sbt cannot resolve hbase
connector
This is my build.sbt:
Hi.
I'm testing "spark testing base". For example:
class MyFirstTest extends FunSuite with SharedSparkContext{
def tokenize(f: RDD[String]) = {
f.map(_.split("").toList)
}
test("really simple transformation"){
val input = List("hi", "hi miguel", "bye")
val expected =
t you started
>> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IntelliJ
>>
>> Thanks
>> Best Regards
>>
>> On Sun, Nov 29, 2015 at 9:48 PM, Masf <masfwo...@gmail.com> wrote:
>>
>>> Hi
>>
Hi
Is it possible to debug spark locally with IntelliJ or another IDE?
Thanks
--
Regards.
Miguel Ángel
Hi Ardo
Some tutorial to debug with Intellij?
Thanks
Regards.
Miguel.
On Sun, Nov 29, 2015 at 5:32 PM, Ndjido Ardo BAR <ndj...@gmail.com> wrote:
> hi,
>
> IntelliJ is just great for that!
>
> cheers,
> Ardo.
>
> On Sun, Nov 29, 2015 at 5:18 PM, Masf <ma
function
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext
as the second argument.
Thanks
Best Regards
On Wed, Aug 19, 2015 at 10:46 PM, Masf masfwo...@gmail.com wrote:
Hi.
I'd like to read Avro files using this library
https
Hi.
I have a dataframe and I want to insert these data into parquet partitioned
table in Hive.
In Spark 1.4 I can use
df.write.partitionBy(x,y).format(parquet).mode(append).saveAsTable(tbl_parquet)
but in Spark 1.3 I can't. How can I do it?
Thanks
--
Regards
Miguel
Hi.
I'd like to read Avro files using this library
https://github.com/databricks/spark-avro
I need to load several files from a folder, not all files. Is there some
functionality to filter the files to load?
And... Is is possible to know the name of the files loaded from a folder?
My problem
Hi.
I have 2 dataframe with 1 and 12 partitions respectively. When I do a inner
join between these dataframes, the result contains 200 partitions. *Why?*
df1.join(df2, df1(id) === df2(id), Inner) = returns 200 partitions
Thanks!!!
--
Regards.
Miguel Ángel
Hi.
I think that it's possible to do:
*df.select($*, lit(null).as(col17, lit(null).as(col18,
lit(null).as(col19,, lit(null).as(col26)*
Any other advice?
Miguel.
On Wed, May 27, 2015 at 5:02 PM, Masf masfwo...@gmail.com wrote:
Hi.
I have a DataFrame with 16 columns (df1
Hi.
I have a DataFrame with 16 columns (df1) and another with 26 columns(df2).
I want to do a UnionAll. So, I want to add 10 columns to df1 in order to
have the same number of columns in both dataframes.
Is there some alternative to withColumn?
Thanks
--
Regards.
Miguel Ángel
endrscp100 then 1 else 0 end test from j'
Let me know if this works.
On 26 May 2015 23:47, Masf masfwo...@gmail.com wrote:
Hi
I don't know how it works. For example:
val result = joinedData.groupBy(col1,col2).agg(
count(lit(1)).as(counter),
min(col3).as(minimum),
sum(case when endrscp 100
guha.a...@gmail.com wrote:
Case when col2100 then 1 else col2 end
On 26 May 2015 00:25, Masf masfwo...@gmail.com wrote:
Hi.
In a dataframe, How can I execution a conditional sentence in a
aggregation. For example, Can I translate this SQL statement to DataFrame?:
SELECT name, SUM
Hi.
In a dataframe, How can I execution a conditional sentence in a
aggregation. For example, Can I translate this SQL statement to DataFrame?:
SELECT name, SUM(IF table.col2 100 THEN 1 ELSE table.col1)
FROM table
GROUP BY name
Thanks
--
Regards.
Miguel
Hi Eric.
Q1:
When I read parquet files, I've tested that Spark generates so many
partitions as parquet files exist in the path.
Q2:
To reduce the number of partitions you can use rdd.repartition(x), x=
number of partitions. Depend on your case, repartition could be a heavy task
Regards.
Hi.
I have a spark application where I store the results into table (with
HiveContext). Some of these columns allow nulls. In Scala, this columns are
represented through Option[Int] or Option[Double].. Depend on the data type.
For example:
*val hc = new HiveContext(sc)*
*var col1:
Hi guys
Regarding to parquet files. I have Spark 1.2.0 and reading 27 parquet files
(250MB/file), it lasts 4 minutes.
I have a cluster with 4 nodes and it seems me too slow.
The load function is not available in Spark 1.2, so I can't test it
Regards.
Miguel.
On Mon, Apr 13, 2015 at 8:12 PM,
)?
--- Original Message ---
From: Masf masfwo...@gmail.com
Sent: April 9, 2015 11:45 PM
To: user@spark.apache.org
Subject: Increase partitions reading Parquet File
Hi
I have this statement:
val file =
SqlContext.parquetfile(hdfs://node1/user/hive/warehouse/file.parquet)
This code
1, 2015 at 7:53 AM, Masf masfwo...@gmail.com wrote:
Hi.
In Spark SQL 1.2.0, with HiveContext, I'm executing the following
statement:
CREATE TABLE testTable STORED AS PARQUET AS
SELECT
field1
FROM table1
*field1 is SMALLINT. If table1 is in text format all it's ok, but if
table1
Hi.
I'm using Spark SQL 1.2. I have this query:
CREATE TABLE test_MA STORED AS PARQUET AS
SELECT
field1
,field2
,field3
,field4
,field5
,COUNT(1) AS field6
,MAX(field7)
,MIN(field8)
,SUM(field9 / 100)
,COUNT(field10)
,SUM(IF(field11 -500, 1, 0))
,MAX(field12)
,SUM(IF(field13 = 1, 1, 0))
Hi.
In Spark SQL 1.2.0, with HiveContext, I'm executing the following statement:
CREATE TABLE testTable STORED AS PARQUET AS
SELECT
field1
FROM table1
*field1 is SMALLINT. If table1 is in text format all it's ok, but if table1
is in parquet format, spark returns the following error*:
Hi Ted.
Spark 1.2.0 an Hive 0.13.1
Regards.
Miguel Angel.
On Tue, Mar 31, 2015 at 10:37 AM, Ted Yu yuzhih...@gmail.com wrote:
Which Spark and Hive release are you using ?
Thanks
On Mar 27, 2015, at 2:45 AM, Masf masfwo...@gmail.com wrote:
Hi.
In HiveContext, when I put
values:
Have you done the above modification on all the machines in your Spark
cluster ?
If you use Ubuntu, be sure that the /etc/pam.d/common-session file
contains the following line:
session required pam_limits.so
On Mon, Mar 30, 2015 at 5:08 AM, Masf masfwo...@gmail.com wrote:
Hi
Hi
I have a problem with temp data in Spark. I have fixed
spark.shuffle.manager to SORT. In /etc/secucity/limits.conf set the next
values:
* softnofile 100
* hardnofile 100
In spark-env.sh set ulimit -n 100
I've restarted the spark service and it
the machines to get the ulimit effect (or
relogin). What operation are you doing? Are you doing too many
repartitions?
Thanks
Best Regards
On Mon, Mar 30, 2015 at 4:52 PM, Masf masfwo...@gmail.com wrote:
Hi
I have a problem with temp data in Spark. I have fixed
spark.shuffle.manager
Hi.
In HiveContext, when I put this statement DROP TABLE IF EXISTS TestTable
If TestTable doesn't exist, spark returns an error:
ERROR Hive: NoSuchObjectException(message:default.TestTable table not found)
at
of window function support in 1.4.0. But it's not a promise
yet.
Cheng
On 3/26/15 7:27 PM, Arush Kharbanda wrote:
Its not yet implemented.
https://issues.apache.org/jira/browse/SPARK-1442
On Thu, Mar 26, 2015 at 4:39 PM, Masf masfwo...@gmail.com wrote:
Hi.
Are the Windowing
Hi.
Are the Windowing and Analytics functions supported in Spark SQL (with
HiveContext or not)? For example in Hive is supported
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
Some tutorial or documentation where I can see all features supported by
Spark
Hi
Spark 1.2.1 uses Scala 2.10. Because of this, your program fails with scala
2.11
Regards
On Thu, Mar 19, 2015 at 8:17 PM, Vijayasarathy Kannan kvi...@vt.edu wrote:
My current simple.sbt is
name := SparkEpiFast
version := 1.0
scalaVersion := 2.11.4
libraryDependencies +=
Hi.
I'm running Spark 1.2.0. I have HiveContext and I execute the following
query:
select sum(field1 / 100) from table1 group by field2;
field1 in hive metastore is a smallint. The schema detected by hivecontext
is a int32:
fileSchema: message schema {
optional int32 field1;
HiveContext for now?
On Fri, Mar 13, 2015 at 4:48 AM, Masf masfwo...@gmail.com wrote:
Hi.
I have a query in Spark SQL and I can not covert a value to BIGINT:
CAST(column AS BIGINT) or
CAST(0 AS BIGINT)
The output is:
Exception in thread main java.lang.RuntimeException: [34.62] failure
Hi all.
When I specify the number of partitions and save this RDD in parquet
format, my app fail. For example
selectTest.coalesce(28).saveAsParquetFile(hdfs://vm-clusterOutput)
However, it works well if I store data in text
selectTest.coalesce(28).saveAsTextFile(hdfs://vm-clusterOutput)
My
fail means here.
On Mon, Mar 16, 2015 at 11:11 AM, Masf masfwo...@gmail.com wrote:
Hi all.
When I specify the number of partitions and save this RDD in parquet
format,
my app fail. For example
selectTest.coalesce(28).saveAsParquetFile(hdfs://vm-clusterOutput)
However, it works well
Hi.
I have a query in Spark SQL and I can not covert a value to BIGINT:
CAST(column AS BIGINT) or
CAST(0 AS BIGINT)
The output is:
Exception in thread main java.lang.RuntimeException: [34.62] failure:
``DECIMAL'' expected but identifier BIGINT found
Thanks!!
Regards.
Miguel Ángel
Thanks
Best Regards
On Wed, Mar 11, 2015 at 9:45 PM, Masf masfwo...@gmail.com wrote:
Hi all
Is it possible to read recursively folders to read parquet files?
Thanks.
--
Saludos.
Miguel Ángel
--
Saludos.
Miguel Ángel
Hi all
Is it possible to read recursively folders to read parquet files?
Thanks.
--
Saludos.
Miguel Ángel
38 matches
Mail list logo