Re: Structured Streaming - Can I start using it?

2017-03-14 Thread Adline Dsilva

On 14 Mar 2017 4:19 p.m., Gaurav Pandya  wrote:
Thanks a lot Michal & Ofir for your insights.

To Ofir - I have not yet finalized my spark streaming code. it is still work in 
progress. Now we have Structured streaming available, so thought to re write it 
to gain maximum benefit in future. As of now, there are no specific functional 
or performance issues Nor I have to leverage any new API. This is just 
considering future aspects.

Thanks
Gaurav

On Tue, Mar 14, 2017 at 1:05 PM, Ofir Manor 
> wrote:
To add to what Michael said, my experience was that Structured Streaming in 2.0 
was half-baked / alpha, but in 2.1 it is significantly more robust. Also a lot 
of its "missing functionality" were not available in Spark Streaming either way.
HOWEVER, you mentioned that you think about rewriting your existing spark 
streaming code... May I ask why do you need a rewrite? Do you have a specific 
functional or performance issues? Some specific new use case or a specific new 
API you want to leverage?
Changing an existing, working solution has its costs, both in dev time and ops 
time (changes to monitoring, troubleshooting etc), so I think you should know 
what you want to achieve here and ask / prototype if current release fits it.


Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: 
ofir.ma...@equalum.io

On Mon, Mar 13, 2017 at 9:45 PM, Michael Armbrust 
> wrote:
I think its very very unlikely that it will get withdrawn.  The primary reason 
that the APIs are still marked experimental is that we like to have several 
releases before committing to interface stability (in particular the interfaces 
to write custom sources and sinks are likely to evolve).  Also, there are 
currently quite a few limitations in the types of queries that we can run (i.e. 
multiple aggregations are disallowed, we don't support stream-stream joins 
yet).  In these cases though, we explicitly say its not supported when you try 
to start your stream.

For the use cases that are supported in 2.1 though (streaming ETL, event time 
aggregation, etc) I'll say that we have been using it in production for several 
months and we have customers doing the same.

On Mon, Mar 13, 2017 at 11:21 AM, Gaurav1809 
> wrote:
I read in spark documentation that Structured Streaming is still ALPHA in
Spark 2.1 and the APIs are still experimental. Shall I use it to re write my
existing spark streaming code? Looks like it is not yet production ready.
What happens if Structured Streaming project gets withdrawn?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Structured-Streaming-Can-I-start-using-it-tp28488.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org







DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).

MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Resizing Image with Scrimage in Spark

2016-10-17 Thread Adline Dsilva
Hi All,

   I have a Hive Table which contains around 500 million photos(Profile picture 
of Users) stored as hex string and total size of the table is 5TB. I'm trying 
to make a solution where images can be retrieved in real-time.

Current Solution,  Resize the images, index it along the user profile to solr. 
For Resizing, Im using a scala library called 
scrimage

While running the udf function im getting below error.
Serialization stack:
- object not serializable (class: com.sksamuel.scrimage.Image, value: Image 
[width=767, height=1024, type=2])
- field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC, name: imgR, type: 
class com.sksamuel.scrimage.Image)

Can anyone suggest method to overcome the above error.

Regards,
Adline


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).

MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.


RE: Anyone got a good solid example of integrating Spark and Solr

2016-09-14 Thread Adline Dsilva
Hi
Take a look into https://github.com/lucidworks/spark-solr . this support 
authentication with kerberized solr. Unfortunately this implementation has 
support from solr 5.x+. and CDH has Solr 4.x. One option is to use Apache Solr 
6.X with CDH.

Regards,
Adline
Sent from Mail for Windows 10

From: Nkechi Achara
Sent: Wednesday, September 14, 2016 7:52 PM
To: user@spark.apache.org
Subject: Anyone got a good solid example of integrating Spark and Solr

Hi All,

I am trying to find some good examples on how to implement Spark and Solr and 
coming up blank. Basically the implementation of spark-solr does not seem to 
work correctly with CDH 552 (*1.5.x branch) where i am receiving various issues 
relating to dependencies, which I have not been fully able to unravel.

I have also attempted to implement a custom solution, where i copy the token 
and jaas to each executor, and set the necessary auth Properties, but this 
still is prone to failure due to serialization, and kerberos auth issues.

Has anyone got an example of an implementation of querying solr in a 
distributed way where kerberos is involved?

Thanks,

K


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).

MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.


RE: Window Functions with SQLContext

2016-08-31 Thread Adline Dsilva
Hi,
  Use function rowNumber instead of row_number

df1.withColumn("row_number", rowNumber.over(w));

Regards,
Adline

From: saurabh3d [saurabh.s.du...@oracle.com]
Sent: 01 September 2016 13:16
To: user@spark.apache.org
Subject: Window Functions with SQLContext

Hi All,

As per  SPARK-11001    ,
Window functions should be supported by SQLContext. But when i try to run

SQLContext sqlContext = new SQLContext(jsc);
WindowSpec w = Window.partitionBy("assetId").orderBy("assetId");
DataFrame df_2 = df1.withColumn("row_number", row_number().over(w));
df_2.show(false);

it fails with:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Could not
resolve window function 'row_number'. Note that, using window functions
currently requires a HiveContext;

This code runs fine with HiveContext.
Any idea what’s going on?  Is this a known issue and is there a workaround
to make Window function work without HiveContext.

Thanks,
Saurabh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Window-Functions-with-SQLContext-tp27636.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).

MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org