[jira] [Updated] (HUDI-1551) Support Partition with BigDecimal/Integer field

2021-04-07 Thread Chanh Le (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanh Le updated HUDI-1551: --- Description: In my data the time indicator field is in BigDecimal/Integer -> due to trading data related

[jira] [Updated] (HUDI-1551) Support Partition with BigDecimal/Integer field

2021-04-07 Thread Chanh Le (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanh Le updated HUDI-1551: --- Fix Version/s: (was: 0.7.0) > Support Partition with BigDecimal/Integer fi

[jira] [Updated] (HUDI-1551) Support Partition with BigDecimal/Integer field

2021-04-07 Thread Chanh Le (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanh Le updated HUDI-1551: --- Summary: Support Partition with BigDecimal/Integer field (was: Support Partition with BigDecimal field

[jira] [Created] (HUDI-1551) Support Partition with BigDecimal field

2021-01-25 Thread Chanh Le (Jira)
Chanh Le created HUDI-1551: -- Summary: Support Partition with BigDecimal field Key: HUDI-1551 URL: https://issues.apache.org/jira/browse/HUDI-1551 Project: Apache Hudi Issue Type: New Feature

Re: SPARK environment settings issue when deploying a custom distribution

2017-06-12 Thread Chanh Le
=2.7.0 -Phive -Phive-thriftserver -Pmesos -Pyarn On Mon, Jun 12, 2017 at 6:14 PM Chanh Le <giaosu...@gmail.com> wrote: > Hi everyone, > > Recently I discovered an issue when processing csv of spark. So I decided > to fix it following this https://issues.apache.org/jira/browse/SPA

SPARK environment settings issue when deploying a custom distribution

2017-06-12 Thread Chanh Le
Hi everyone, Recently I discovered an issue when processing csv of spark. So I decided to fix it following this https://issues.apache.org/jira/browse/SPARK-21024 I built a custom distribution for internal uses. I built it in my local machine then upload the distribution to server. server's

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-09 Thread Chanh Le
Hi Takeshi, Thank you very much. Regards, Chanh On Thu, Jun 8, 2017 at 11:05 PM Takeshi Yamamuro <linguin@gmail.com> wrote: > I filed a jira about this issue: > https://issues.apache.org/jira/browse/SPARK-21024 > > On Thu, Jun 8, 2017 at 1:27 AM, Chanh Le <giaos

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Chanh Le
Can you recommend one? Thanks. On Thu, Jun 8, 2017 at 2:47 PM Jörn Franke <jornfra...@gmail.com> wrote: > You can change the CSV parser library > > On 8. Jun 2017, at 08:35, Chanh Le <giaosu...@gmail.com> wrote: > > > I did add mode -> DROPMALFORMED but it

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Chanh Le
; include lines that have more than maxColumns. Choose mode "DROPMALFORMED" > > On 8. Jun 2017, at 03:04, Chanh Le <giaosu...@gmail.com> wrote: > > Hi Takeshi, Jörn Franke, > > The problem is even I increase the maxColumns it still have some lines > have larg

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Chanh Le
On Wed, Jun 7, 2017 at 9:45 AM, Jörn Franke <jornfra...@gmail.com> wrote: > >> Spark CSV data source should be able >> >> On 7. Jun 2017, at 17:50, Chanh Le <giaosu...@gmail.com> wrote: >> >> Hi everyone, >> I am using Spark 2.1.1 to read csv files and

[CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Chanh Le
Hi everyone, I am using Spark 2.1.1 to read csv files and convert to avro files. One problem that I am facing is if one row of csv file has more columns than maxColumns (default is 20480). The process of parsing was stop. Internal state when error was thrown: line=1, column=3, record=0,

Re: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Chanh Le
Id where url not like '%sell%', then you can just > try left semi join, which Spark will use SortMerge join in this case, I guess. > > Yong > > From: Yong Zhang <java8...@hotmail.com <mailto:java8...@hotmail.com>> > Sent: Tuesday, February 21, 2017 1:17 PM > To: S

Re: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Chanh Le
PM, Chanh Le <giaosu...@gmail.com> wrote: > > Hi everyone, > > I am working on a dataset like this > user_id url > 1 lao.com/buy <http://lao.com/buy> > 2 bao.com/sell <http://bao.com/sell> > 2

How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Chanh Le
Hi everyone, I am working on a dataset like this user_id url 1lao.com/buy 2bao.com/sell 2cao.com/market 1lao.com/sell 3vui.com/sell I have to find all user_id with url not contain sell.

How to config zookeeper quorum in sqlline command?

2017-02-15 Thread Chanh Le
Hi everybody, I am a newbie start using phoenix for a few days after did some research about config zookeeper quorum and still stuck I finally wanna ask directly into the community. Current zk quorum of mine a little odd "hbase.zookeeper.quorum", "zoo1:2182,zoo1:2183,zoo2:2182" I edited the

How to set classpath for a job that submit to Mesos cluster

2016-12-13 Thread Chanh Le
Hi everyone, I have a job that read segment data from druid then convert to csv. When I run it in local mode it works fine. /home/airflow/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --driver-memory 1g --master "local[4]" --files /home/airflow/spark-jobs/forecast_jobs/prod.conf --conf

[jira] [Created] (ZEPPELIN-1723) Math formula support library path error

2016-11-28 Thread Chanh Le (JIRA)
Chanh Le created ZEPPELIN-1723: -- Summary: Math formula support library path error Key: ZEPPELIN-1723 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1723 Project: Zeppelin Issue Type: Bug

Re: Sharing RDDS across applications and users

2016-10-28 Thread Chanh Le
t; damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. >

Re: Sharing RDDS across applications and users

2016-10-27 Thread Chanh Le
claimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 27 October 2016 at 11:29, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi Mich, >

Re: Sharing RDDS across applications and users

2016-10-27 Thread Chanh Le
Hi Mich, Alluxio is the good option to go. Regards, Chanh > On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh > wrote: > > > There was a mention of using Zeppelin to share RDDs with many users. From the > notes on Zeppelin it appears that this is sharing UI and I am

Re: How to make Mesos Cluster Dispatcher of Spark 1.6.1 load my config files?

2016-10-19 Thread Chanh Le
I found a workaround that works to me: > http://stackoverflow.com/questions/29552799/spark-unable-to-find-jdbc-driver/40114125#40114125 > > <http://stackoverflow.com/questions/29552799/spark-unable-to-find-jdbc-driver/40114125#40114125> > > Regards, > Daniel > > El jue.

Re: What is the difference between mini-batch vs real time streaming in practice (not theory)?

2016-09-27 Thread Chanh Le
The different between Stream vs Micro Batch is about Ordering of Messages > Spark Streaming guarantees ordered processing of RDDs in one DStream. Since > each RDD is processed in parallel, there is not order guaranteed within the > RDD. This is a tradeoff design Spark made. If you want to

Re: Using Zeppelin with Spark FP

2016-09-15 Thread Chanh Le
etary damages arising from such > loss, damage or destruction. > > > On 15 September 2016 at 08:27, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi, > I am using Zeppelin 0.7 snapshot and it works well both Spark 2.0 and STS of >

Re: Using Zeppelin with Spark FP

2016-09-15 Thread Chanh Le
Hi, I am using Zeppelin 0.7 snapshot and it works well both Spark 2.0 and STS of Spark 2.0. > On Sep 12, 2016, at 4:38 PM, Mich Talebzadeh > wrote: > > Hi Sachin, > > Downloaded Zeppelin 0.6.1 > > Now I can see the plot in a tabular format and graph. it looks

Re: Zeppelin patterns with the streaming data

2016-09-13 Thread Chanh Le
Hi Mich, I think it can http://www.quartz-scheduler.org/documentation/quartz-2.1.x/tutorials/crontrigger > On Sep 13, 2016, at 1:57 PM, Mich Talebzadeh > wrote: > > Thanks

Re: Spark 2.0.0 Thrift Server problem with Hive metastore

2016-09-06 Thread Chanh Le
Did anyone use STS of Spark 2.0 on production? For me I still waiting for the compatible in parquet file created by Spark 1.6.1 > On Sep 6, 2016, at 2:46 PM, Campagnola, Francesco > wrote: > > I mean I have installed Spark 2.0 in the same environment where

Re: Design patterns involving Spark

2016-08-29 Thread Chanh Le
Hi everyone, Seems a lot people using Druid for realtime Dashboard. I’m just wondering of using Druid for main storage engine because Druid can store the raw data and can integrate with Spark also (theoretical). In that case do we need to store 2 separate storage Druid (store segment in HDFS)

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Chanh Le
> Does parquet file has limit in size ( 1TB ) ? I did’t see any problem but 1TB is too big to operation need to divide into small pieces. > Should we use SaveMode.APPEND for long running streaming app ? Yes, but you need to partition it by time so it easy to maintain like update or delete a

Re: Anyone else having trouble with replicated off heap RDD persistence?

2016-08-16 Thread Chanh Le
Hi Michael, You should you Alluxio instead. http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html It should be easier. Regards, Chanh > On Aug 17, 2016, at 5:45 AM, Michael Allman

Re: Does Spark SQL support indexes?

2016-08-13 Thread Chanh Le
Hi Taotao, Spark SQL doesn’t support index :). > On Aug 14, 2016, at 10:03 AM, Taotao.Li wrote: > > > hi, guys, does Spark SQL support indexes? if so, how can I create an index > on my temp table? if not, how can I handle some specific queries on a very > large

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

2016-08-10 Thread Chanh Le
Hi Gene, It's a Spark 2.0 issue. I switch to Spark 1.6.1 it's ok now. Thanks. On Thursday, July 28, 2016 at 4:25:48 PM UTC+7, Chanh Le wrote: > > Hi everyone, > > I have problem when I create a external table in Spark Thrift Server (STS) > and query the data. > > S

Re: hdfs persist rollbacks when spark job is killed

2016-08-08 Thread Chanh Le
gt; Regards, > Gourav Sengupta > > On Mon, Aug 8, 2016 at 7:41 AM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > It’s out of the box in Spark. > When you write data into hfs or any storage it only creates a new parquet > folder properly i

Re: hdfs persist rollbacks when spark job is killed

2016-08-08 Thread Chanh Le
It’s out of the box in Spark. When you write data into hfs or any storage it only creates a new parquet folder properly if your Spark job was success else only _temp folder inside to mark it’s still not success (spark was killed) or nothing inside (Spark job was failed). > On Aug 8, 2016,

Re: [Spark1.6] Or (||) operator not working in DataFrame

2016-08-07 Thread Chanh Le
You should use df.where(conditionExpr) which is more convenient to express some simple term in SQL. /** * Filters rows using the given SQL expression. * {{{ * peopleDf.where("age > 15") * }}} * @group dfops * @since 1.5.0 */ def where(conditionExpr: String): DataFrame = {

Re: [Spark 2.0] Problem with Spark Thrift Server show NULL instead of showing BIGINT value

2016-08-04 Thread Chanh Le
I checked with Spark 1.6.1 it still works fine. I also check out latest source code in Spark 2.0 branch and built and get the same issue. I think because of changing API to dataset in Spark 2.0? Regards, Chanh > On Aug 5, 2016, at 9:44 AM, Chanh Le <giaosu...@gmail.com> wrote

Re: [Spark 2.0] Problem with Spark Thrift Server show NULL instead of showing BIGINT value

2016-08-04 Thread Chanh Le
; > > > Nicholas Szandor Hakobian, Ph.D. > Data Scientist > Rally Health > nicholas.hakob...@rallyhealth.com <mailto:nicholas.hakob...@rallyhealth.com> > > > On Thu, Aug 4, 2016 at 4:53 AM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote:

Re: [Thriftserver2] Controlling number of tasks

2016-08-03 Thread Chanh Le
I believe there is no way to reduce tasks by Hive using coalesce because when It come to Hive just read the files and depend on number of files you put into. So The way to did was coalesce at the ELT layer put a small number of files as possible reduce IO time for reading file. > On Aug 3,

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
ny and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or de

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
t is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 2 August 2016 at 10:18, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > I tried and it works

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
il's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 2 August 2016 at 09:13, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.co

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 2 August 2016 at 08:41, Chanh Le <giaosu...@gm

Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
Hi everyone, I setup STS and use Zeppelin to query data through JDBC connection. A problem we are facing is users usually forget to put limit in the query so it causes hang the cluster. SELECT * FROM tableA; Is there anyway to config the limit by default ? Regards, Chanh

Re: [Spark 2.0] Why MutableInt cannot be cast to MutableLong?

2016-07-31 Thread Chanh Le
think this error still happen in Spark 2.0 > On Aug 1, 2016, at 9:21 AM, Chanh Le <giaosu...@gmail.com> wrote: > > Sorry my bad, I ran in Spark 1.6.1 but what about this error? > Why Int cannot be cast to Long? > > > Thanks. > > >> On Aug 1, 2

Re: [Spark 2.0] Why MutableInt cannot be cast to MutableLong?

2016-07-31 Thread Chanh Le
DD, which was removed in #12354 > <https://github.com/apache/spark/pull/12354>. > > On Sun, Jul 31, 2016 at 2:12 AM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi everyone, > Why MutableInt cannot be cast to MutableLong? > It’s real

[Spark 2.0] Why MutableInt cannot be cast to MutableLong?

2016-07-31 Thread Chanh Le
Hi everyone, Why MutableInt cannot be cast to MutableLong? It’s really weird and seems Spark 2.0 has a lot of error with parquet about format. org.apache.spark.sql.catalyst.expressions.MutableInt cannot be cast to org.apache.spark.sql.catalyst.expressions.MutableL ong Caused by:

[jira] [Commented] (SPARK-16518) Schema Compatibility of Parquet Data Source

2016-07-30 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400755#comment-15400755 ] Chanh Le commented on SPARK-16518: -- Did we have a patch for that? Right now I have this error too

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

2016-07-30 Thread Chanh Le
Hi Mich some thing different between your log > On Jul 30, 2016, at 6:58 PM, Mich Talebzadeh > wrote: > > parquet-mr version 1.6.0 > org.apache.parquet.VersionParser$VersionParseException: Could not parse > created_by: parquet-mr version 1.6.0 using format: (.+)

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

2016-07-30 Thread Chanh Le
itly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 30 July 2016 at 11:52, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > I agree with you. Maybe some ch

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

2016-07-30 Thread Chanh Le
s technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 30 July 2016 at 11:43, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: &g

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

2016-07-30 Thread Chanh Le
ps://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying o

Re: Spark Standalone Cluster: Having a master and worker on the same node

2016-07-28 Thread Chanh Le
Hi Jestin, I saw most of setup usually setup along master and slave in a same node. Because I think master doesn't do as much job as slave does and resource is expensive we need to use it. BTW In my setup I setup along master and slave. I have 5 nodes and 3 of which are master and slave running

Re: Spark Thrift Server 2.0 set spark.sql.shuffle.partitions not working when query

2016-07-28 Thread Chanh Le
Thank you Takeshi it works fine now. Regards, Chanh > On Jul 28, 2016, at 2:03 PM, Takeshi Yamamuro <linguin@gmail.com> wrote: > > Hi, > > you need to set the value when you just start the server. > > // maropu > > On Thu, Jul 28, 2016 at 3:59

Spark 2.0 just released

2016-07-26 Thread Chanh Le
Its official now http://spark.apache.org/releases/spark-release-2-0-0.html Everyone should check it out.

Re: Spark Web UI port 4040 not working

2016-07-26 Thread Chanh Le
You’re running in StandAlone Mode? Usually inside active task it will show the address of current job. or you can check in master node by using netstat -apn | grep 4040 > On Jul 26, 2016, at 8:21 AM, Jestin Ma wrote: > > Hello, when running spark jobs, I can access

Re: dataframe.foreach VS dataframe.collect().foreach

2016-07-26 Thread Chanh Le
Hi Ken, blacklistDF -> just DataFrame Spark is lazy until you call something like collect, take, write it will execute the hold process like you do map or filter before you collect. That mean until you call collect spark do nothing so you df would not have any data -> can’t call foreach. Call

Re: Optimize filter operations with sorted data

2016-07-21 Thread Chanh Le
t of time and network in > filter ? > > 2016-07-07 11:58 GMT+02:00 Chanh Le <giaosu...@gmail.com>: > >> Hi Tan, >> It depends on how data organise and what your filter is. >> For example in my case: I store data by partition by field time and >> ne

Re: run spark apps in linux crontab

2016-07-21 Thread Chanh Le
goes to the log file and STDOUT at the same time. > > > > ThanksBest regards! > San.Luo > > - 原始邮件 - > 发件人:Chanh Le <giaosu...@gmail.com> > 收件人:luohui20...@sina.com > 抄送人:focus <focushe...@qq.com>, user <user@spark.apache.org> > 主题:R

Re: run spark apps in linux crontab

2016-07-20 Thread Chanh Le
you should you use command.sh | tee file.log > On Jul 21, 2016, at 10:36 AM, > wrote: > > > thank you focus, and all. > this problem solved by adding a line ". /etc/profile" in my shell. > > > > > ThanksBest

Attribute name "sum(proceeds)" contains invalid character(s) among " ,;{}()\n\t="

2016-07-20 Thread Chanh Le
Hi everybody, I got a error about the name of the columns is not following the rule. Please tell me the way to fix it. Here is my code metricFields Here is a Seq of metrics: spent, proceed, click, impression sqlContext .sql(s"select * from hourly where time between '$dateStr-00' and

[jira] [Created] (MESOS-5868) Task is running but not show in UI

2016-07-19 Thread Chanh Le (JIRA)
Chanh Le created MESOS-5868: --- Summary: Task is running but not show in UI Key: MESOS-5868 URL: https://issues.apache.org/jira/browse/MESOS-5868 Project: Mesos Issue Type: Bug Components

[jira] [Updated] (MESOS-5868) Task is running but not show in UI

2016-07-19 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanh Le updated MESOS-5868: Description: This happen when I try to restart the masters node without downing any slaves. As you can see

Re: the spark job is so slow - almost frozen

2016-07-18 Thread Chanh Le
Hi, What about the network (bandwidth) between hive and spark? Does it run in Hive before then you move to Spark? Because It's complex you can use something like EXPLAIN command to show what going on. > On Jul 18, 2016, at 5:20 PM, Zhiliang Zhu wrote: > > the

Re: Inode for STS

2016-07-18 Thread Chanh Le
Hi Ayan, I seem like you mention this https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.start.cleanup.scratchdir

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-17 Thread Chanh Le
matching, thats what Simba is > complaining about. Try to change the protocol to SASL? > > On Fri, Jul 15, 2016 at 1:20 PM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi Ayan, > Thanks. I got it. > Did you have any problem when conne

Re: Memory issue java.lang.OutOfMemoryError: Java heap space

2016-07-13 Thread Chanh Le
Going "production" was a little more complex that I > thought. > >> On Jul 13, 2016, at 10:35 PM, Chanh Le <giaosu...@gmail.com >> <mailto:giaosu...@gmail.com>> wrote: >> >> Hi Jean, >> How do you run your Spark Application? Local Mode, C

Re: Memory issue java.lang.OutOfMemoryError: Java heap space

2016-07-13 Thread Chanh Le
Hi Jean, How do you run your Spark Application? Local Mode, Cluster Mode? If you run in local mode did you use —driver-memory and —executor-memory because in local mode your setting about executor and driver didn’t work that you expected. > On Jul 14, 2016, at 8:43 AM, Jean Georges Perrin

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-13 Thread Chanh Le
b. I think you can set these properties (same way you'd do in hive cli) > c. You can create tables/databases with a LOCATION clause, in case you need > to use non-standard path. > > Best > Ayan > > On Wed, Jul 13, 2016 at 3:20 PM, Chanh Le <giaosu...@gmail.com > <

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-12 Thread Chanh Le
interpreter but I need to add it to zeppelin :) > > Best > Ayan > > On Wed, Jul 13, 2016 at 1:53 PM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi Ayan, > How to set hive metastore in Zeppelin. I tried but not success. > The wa

Re: Spark cache behaviour when the source table is modified

2016-07-12 Thread Chanh Le
Hi Anjali, The Cached is immutable you can’t update data into. They way to update cache is re-create cache. > On Jun 16, 2016, at 4:24 PM, Anjali Chadha wrote: > > Hi all, > > I am having a hard time understanding the caching concepts in Spark. > > I have a hive

Re: Connection via JDBC to Oracle hangs after count call

2016-07-11 Thread Chanh Le
Hi Mich, If I have a stored procedure in Oracle write like this SP get Info: PKG_ETL.GET_OBJECTS_INFO( p_LAST_UPDATED VARCHAR2, p_OBJECT_TYPE VARCHAR2, p_TABLE OUT SYS_REFCURSOR); How to call in Spark because the output is cursor p_TABLE OUT SYS_REFCURSOR. Thanks.

Re: Zeppelin Spark with Dynamic Allocation

2016-07-11 Thread Chanh Le
> > > Tamas Szuromi > Data Analyst > Skype: tromika > E-mail: tamas.szur...@odigeo.com <mailto:n...@odigeo.com> > > ODIGEO Hungary Kft. > 1066 Budapest > Weiner Leó u. 16. > www.liligo.com  <http://www.liligo.com/> > check out our newest video  <ht

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Chanh Le
gt; > > // maropu > > > On Mon, Jul 11, 2016 at 12:01 PM, ayan guha <guha.a...@gmail.com > <mailto:guha.a...@gmail.com>> wrote: > Hi > > Can you try using JDBC interpreter with STS? We are using Zeppelin+STS on > YARN for few months now without much i

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Chanh Le
s now without much issue. > > On Mon, Jul 11, 2016 at 12:48 PM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi everybody, > We are using Spark to query big data and currently we’re using Zeppelin to > provide a UI for technical users. >

How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Chanh Le
Hi everybody, We are using Spark to query big data and currently we’re using Zeppelin to provide a UI for technical users. Now we also need to provide a UI for business users so we use Oracle BI tools and set up a Spark Thrift Server (STS) for it. When I run both Zeppelin and STS throw error:

Re: problem making Zeppelin 0.6 work with Spark 1.6.1, throwing jackson.databind.JsonMappingException exception

2016-07-09 Thread Chanh Le
Hi, This weird because I am using Zeppelin from version 0.5.6 and just upgraded to 0.6.0 for couple of days both work fine with Spark 1.6.1. For 0.6.0 I am using zeppelin-0.6.0-bin-netinst. > On Jul 9, 2016, at 9:25 PM, Mich Talebzadeh wrote: > > Hi, > > I just

Re: Why so many parquet file part when I store data in Alluxio or File?

2016-07-08 Thread Chanh Le
> writing out their partition of the files. > > Hope that helps, > Gene > > On Sun, Jul 3, 2016 at 8:02 PM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi Gene, > Could you give some suggestions on that? > > > &g

Re: Any ways to connect BI tool to Spark without Hive

2016-07-07 Thread Chanh Le
elopment using Zeppelin and STS. > > One thing to note: many BI tools like Qliksense, Tablaue (not sure of oracle > Bi Tool) queires and the caches data on client side. This works really well > in real life. > > > On Fri, Jul 8, 2016 at 1:58 PM, Chanh Le <giaosu...@gmai

Re: Any ways to connect BI tool to Spark without Hive

2016-07-07 Thread Chanh Le
> relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 8 July 2016 at 04:58, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...

Re: Any ways to connect BI tool to Spark without Hive

2016-07-07 Thread Chanh Le
data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 8 July 2016 at 04:19, Chanh Le <giaosu

Any ways to connect BI tool to Spark without Hive

2016-07-07 Thread Chanh Le
Hi everyone, Currently we use Zeppelin to analytics our data and because of using SQL it’s hard to distribute for users use. But users are using some kind of Oracle BI tools to analytic because it support some kinds of drag and drop and we can do some kind of permitted for each user. Our

Re: Optimize filter operations with sorted data

2016-07-07 Thread Chanh Le
Hi Tan, It depends on how data organise and what your filter is. For example in my case: I store data by partition by field time and network_id. If I filter by time or network_id or both and with other field Spark only load part of time and network in filter then filter the rest. > On Jul 7,

Re: Why so many parquet file part when I store data in Alluxio or File?

2016-07-03 Thread Chanh Le
Hi Gene, Could you give some suggestions on that? > On Jul 1, 2016, at 5:31 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > The comment from zhangxiongfei was from a year ago. > > Maybe something changed since them ? > > On Fri, Jul 1, 2016 at 12:07 AM, Ch

Re: Why so many parquet file part when I store data in Alluxio or File?

2016-06-30 Thread Chanh Le
ill be writing their > partitions to separate part files. > > Thanks > Deepak > > On 1 Jul 2016 8:01 am, "Chanh Le" <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi everyone, > I am using Alluxio for storage. But I am little bit conf

Re: Looking for help about stackoverflow in spark

2016-06-30 Thread Chanh Le
Hi John, I think it relates to drivers memory more than the others thing you said. Can you just increase more memory for driver? > On Jul 1, 2016, at 9:03 AM, johnzeng wrote: > > I am trying to load a 1 TB collection into spark cluster from mongo. But I am > keep getting

Re: Best practice for handing tables between pipeline components

2016-06-29 Thread Chanh Le
Hi Everett, We are using Alluxio for the last 2 months. We implement Alluxio for sharing data each Spark Job, isolated Spark only for process layer and Alluxio for the storage layer. > On Jun 29, 2016, at 2:52 AM, Everett Anderson > wrote: > > Thanks! Alluxio

Re: Spark 2.0 Preview After caching query didn't work and can't kill job.

2016-06-15 Thread Chanh Le
cala:2156) at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2155) at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2449) at org.apache.spark.sql.Dataset.count(Dataset.scala:2155) ... 48 elided I lost all my executors. > On Jun 15, 2016, at 8:44 PM, Chanh Le <giao

Re: Spark 2.0 Preview After caching query didn't work and can't kill job.

2016-06-15 Thread Chanh Le
ks, > Gene > > On Tue, Jun 14, 2016 at 3:45 AM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > I am testing Spark 2.0 > I load data from alluxio and cached then I query but the first query is ok > because it kick off cache action. But

Spark 2.0 Preview After caching query didn't work and can't kill job.

2016-06-14 Thread Chanh Le
I am testing Spark 2.0 I load data from alluxio and cached then I query but the first query is ok because it kick off cache action. But after that I run the query again and it’s stuck. I ran in cluster 5 nodes in spark-shell. Did anyone has this issue?

Re: Spark Partition by Columns doesn't work properly

2016-06-09 Thread Chanh Le
Ok, thanks. On Thu, Jun 9, 2016, 12:51 PM Jasleen Kaur <jasleenkaur1...@gmail.com> wrote: > The github repo is https://github.com/datastax/spark-cassandra-connector > > The talk video and slides should be uploaded soon on spark summit website > > > On Wednesday, June 8

Re: Spark Partition by Columns doesn't work properly

2016-06-08 Thread Chanh Le
us on > real business value > > On Wednesday, June 8, 2016, Chanh Le <giaosu...@gmail.com> wrote: > >> Hi everyone, >> I tested the partition by columns of data frame but it’s not good I mean >> wrong. >> I am using Spark 1.6.1 load data from Cassandra. >>

Spark Partition by Columns doesn't work properly

2016-06-08 Thread Chanh Le
Hi everyone, I tested the partition by columns of data frame but it’s not good I mean wrong. I am using Spark 1.6.1 load data from Cassandra. I repartition by 2 field date, network_id - 200 partitions I reparation by 1 field date - 200 partitions. but my data is data of 90 days -> I mean if we

[jira] [Issue Comment Deleted] (SPARK-7703) Task failure caused by block fetch failure in BlockManager.doGetRemote() when using TorrentBroadcast

2016-05-27 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanh Le updated SPARK-7703: Comment: was deleted (was: Any update on that? I have the same error too. java.io.IOException

[jira] [Commented] (SPARK-7703) Task failure caused by block fetch failure in BlockManager.doGetRemote() when using TorrentBroadcast

2016-05-27 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303655#comment-15303655 ] Chanh Le commented on SPARK-7703: - Any update on that? I have the same error. java.io.IOException

[jira] [Comment Edited] (SPARK-7703) Task failure caused by block fetch failure in BlockManager.doGetRemote() when using TorrentBroadcast

2016-05-27 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303655#comment-15303655 ] Chanh Le edited comment on SPARK-7703 at 5/27/16 6:52 AM: -- Any update on that? I

[jira] [Commented] (MESOS-4565) slave recovers and attempt to destroy executor's child containers, then begins rejecting task status updates

2016-05-23 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297666#comment-15297666 ] Chanh Le commented on MESOS-4565: - Any update on that? I still get the issues. > slave recov

[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-04-27 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261541#comment-15261541 ] Chanh Le commented on CASSANDRA-10661: -- [~xedin] Thank man. You got my day. > Integrate S

[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-04-27 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261528#comment-15261528 ] Chanh Le commented on CASSANDRA-10661: -- Hi I am using cassandra 3.5 and I have problem when

How to config my django project to use tastypie and mongoengine

2012-11-25 Thread Chanh Le
In the settings.py import mongoengine mongoengine.connect('cooking') AUTHENTICATION_BACKENDS = ( 'mongoengine.django.auth.MongoEngineBackend', ) SESSION_ENGINE = 'mongoengine.django.sessions' MIDDLEWARE_CLASSES = ( 'django.middleware.common.CommonMiddleware',

Chanh Le is out of the office.

2008-02-07 Thread Chanh Le
I will be out of the office starting 02/02/2008 and will not return until 02/19/2008. If you need immediate assistance, please contact Murali Bharathan at 818-575-1500 or internally at 578-4304. Please contact Murali Bharathan or Kathy Ragatz for AS400 issues. Thanks. CL

Chanh Le is out of the office.

2005-01-23 Thread Chanh Le
I will be out of the office starting 1/17/2005 and will not return until 2/7/2005. I am on vacation and back on 02/07/2005. Please contact Paul Bambah and Ravindra Dabbiru for AS400 issues. Thanks. ___ To unsubscribe, send

  1   2   >