Re: Apache Project

2021-04-18 Thread Timothy Spann
I have a few listed here

https://github.com/tspannhw/EverythingApacheNiFi


On Sun, Apr 18, 2021 at 10:22 AM Pinnamareddy, Gopal <
gopal.pinnamare...@alionscience.com> wrote:

> I’m looking to implement the NiFi Project scratch onwards,  I was
> installed the server ec2 NiFi and business cases . Please let me know any
> webnair or sessions available fo r the project implementations. Looking
> forward to hear from you
>
>
> Thank you
> -Gopal
>
> Sent from Mail for
> Windows 10
>
> --
*Tim Spann*
 Principal DataFlow Field Engineer - 609-250-5894 - https://dev.to/tspannhw -
@*PaaSDev*

[image: Cloudera][image: Cloudera on Twitter][image: Cloudera on
Facebook][image:
Cloudera on LinkedIn]


Using Apache MXNet in Production Deep Learning Streaming Pipelines

2020-12-04 Thread Timothy Spann
Attached is the slides and the live video.

Here is a link to the mp4

https://app.box.com/s/ufy3iary8rak9hj4fmpppfum72smlmt2

the PDF slides
https://app.box.com/s/r00la1r3mpupquyhqk50w8zxsxp8d6nh

*Tim Spann*
 Principal DataFlow Field Engineer - 609-250-5894 - https://dev.to/tspannhw -
@*PaaSDev*

[image: Cloudera][image: Cloudera on Twitter][image: Cloudera on
Facebook][image:
Cloudera on LinkedIn]


Re: Submission details for Apache MXNet Day

2020-12-02 Thread Timothy Spann
Sure.

*Tim Spann*
 Principal DataFlow Field Engineer - 609-250-5894 - https://dev.to/tspannhw -
@*PaaSDev*

[image: Cloudera][image: Cloudera on Twitter][image: Cloudera on
Facebook][image:
Cloudera on LinkedIn]







On Wed, Dec 2, 2020 at 12:31 PM Vartika Singh  wrote:

> Hello presenters,
>
>
>
> Thank you for offering to speak at Apache MXNet day.
>
>
>
> For your submissions, please make sure that the video resolution is at or
> below 1920x1080 and 30fps and in .MP4 format. It will be good to have
> slides theme as your respective organization allows or keep them neutral.
>
>
>
> Thank you
>
> Vartika
>
>
>


Re: Present at Apache MXNet Day 2020 on Using Apache MXNet in Production Deep Learning Streaming Pipelines

2020-12-01 Thread Timothy Spann
I will do, thanks.

*Tim Spann*
 Principal DataFlow Field Engineer - 609-250-5894 - https://dev.to/tspannhw -
@*PaaSDev*
<https://www.flankstack.dev/>
[image: Cloudera][image: Cloudera on Twitter][image: Cloudera on
Facebook][image:
Cloudera on LinkedIn]







On Tue, Dec 1, 2020 at 2:32 AM Vartika Singh  wrote:

> Hi Timothy,
>
>
>
> Quick note.
>
>
>
> Please send a *link to downloadable recording* to @events@mxnet.apache.org
> 
>
>
>
> *From: *Timothy Spann 
> *Date: *Monday, November 30, 2020 at 10:30 PM
> *To: *Vartika Singh 
> *Cc: *Sam Skalicky , Triston Cao <
> trist...@nvidia.com>, "Zha, Sheng" 
> *Subject: *Re: Present at Apache MXNet Day 2020 on Using Apache MXNet in
> Production Deep Learning Streaming Pipelines
>
>
>
> *External email: Use caution opening links or attachments*
>
>
>
>
>
> I confirm my presentation and acceptance.  I will send in a recorded talk
> by dec 6
>
>
>
> On Tue, Dec 1, 2020 at 1:21 AM Vartika Singh  wrote:
>
> Hello Timothy,
>
>
>
> The Apache MXNet team has accepted your talk. We are very excited to have
> you present on this event. Your talk has been scheduled at 11:56 AM PST on
> Dec 14th for 15 minutes.
>
>
>
> The agenda/time is still evolving. We will inform you if there are any
> changes.
>
> Here is the updated website -  https://s.apache.org/MXNet2020
>
>
>
> *Action Required:*
>
>- We request you to send your recorded talk (15 minutes) to us by Dec 6
>th midnight PST.
>- Please respond by end of tomorrow to confirm your presentation and
>attendance.
>
>
>
> *Question Answers:*
>
> We plan to have a slack channel in parallel to your recording where you
> can respond to questions while your recording plays. We will add the slack
> channel names to the respective talks in a week.
>
>
>
> Please reach out if you have any questions. We will send out the webcast
> details by end of Wednesday.
>
>
>
> Thank you
>
> Vartika
>
>
>
>
>
> *From: *Timothy Spann 
> *Date: *Monday, November 23, 2020 at 11:36 AM
> *To: *Vartika Singh 
> *Cc: *apachemxnetday , "s...@amazon.com" <
> s...@amazon.com>, "Zha, Sheng" , Triston Cao <
> trist...@nvidia.com>
> *Subject: *Re: Call for Presentations for Apache MXNet Day
>
>
>
> Great, thanks for the update.
>
> *Tim Spann*
>
>  Principal DataFlow Field Engineer - 609-250-5894 -
> https://dev.to/tspannhw - @*PaaSDev*
>
> *Error! Filename not specified.* <https://www.flankstack.dev/>
>
> *Error! Filename not specified.**Error! Filename not specified.**Error!
> Filename not specified.**Error! Filename not specified.*
>
>
>
>
>
>
>
> On Mon, Nov 23, 2020 at 2:27 PM Vartika Singh  wrote:
>
> Hello Timothy,
>
>
>
> We are reviewing the submissions. We aim to inform the submitters in few
> days.
>
>
>
> Thank you
>
> Vartika
> --
>
> *From:* Timothy Spann 
> *Sent:* Monday, November 23, 2020 11:26 AM
> *To:* apachemxnetday
> *Cc:* s...@amazon.com; Zha, Sheng; Vartika Singh; Triston Cao
> *Subject:* Re: Call for Presentations for Apache MXNet Day
>
>
>
> *External email: Use caution opening links or attachments*
>
>
>
> I didn't see a reply in my email.
>
>
> *Tim Spann*
>
>  Principal DataFlow Field Engineer - 609-250-5894 -
> https://dev.to/tspannhw - @*PaaSDev*
>
> *Error! Filename not specified.* <https://www.flankstack.dev/>
>
> *Error! Filename not specified.**Error! Filename not specified.**Error!
> Filename not specified.**Error! Filename not specified.*
>
>
>
>
>
>
>
> On Mon, Nov 23, 2020 at 11:49 AM apachemxnetday 
> wrote:
>
>
>
>
>
> *From: *Timothy Spann 
> *Date: *Friday, November 6, 2020 at 7:22 PM
> *To: *apachemxnetday 
> *Subject: *Call for Presentations for Apache MXNet Day
>
>
>
> *Title:*   "Using Apache MXNet in Production Deep Learning Streaming
> Pipelines"
>
>
>
> *Abstract:*
>
>
>
> As a Data Engineer I am often tasked with taking Machine Learning and Deep
> Learning models into production, sometimes in the cloud and sometimes at
> the edge.  I have developed Java code that allows us to run these models at
> the edge and as part of a sensor/webcam/images/data stream.  I have
> developed custom interfaces in Apache NiFi to enable real-time
> classification against MXNet models directly through the Java API or
> through DJL.AI's Java interface.   I will demo running models on NVIDIA
> Jetson Nanos and NVIDIA Xavier NX devices as well as in

Jira contributor access

2020-04-04 Thread Timothy Spann
Hi team,

>
> I'd like to contribute to the NiFi project. Could I get a contributor
> access please?
> My username is tspannhw



Thanks


Tim

>
> --
*Tim Spann*
 Principal DataFlow Field Engineer - 609-250-5894 - https://dev.to/tspannhw -
@*PaaSDev*

[image: Cloudera][image: Cloudera on Twitter][image: Cloudera on
Facebook][image:
Cloudera on LinkedIn]


Re: Ingesting golden gate messages to Hbase using Nifi

2018-11-12 Thread Timothy Spann
Enforcing order is really tricky with Kafka.   The only way to enforce
order is to reduce the # of nodes processing.  You can have one NiFi master
node read from Kafka and have it distribute the workload to other NiFi
nodes and force ordering.  Or you may want to batch them up into say 10-15
minute chunks.   Or you could use a staging table.

You could also have something mark the order to make sure they run in
order.   I am not sure if Golden Gate can annotate them.   I think there is
a Kafka # that could help.


On Mon, Nov 12, 2018 at 12:16 PM Boris Tyukin  wrote:

> Faisal, BTW I stumbled upon this doc, that explains how HBase GoldenGate
> handler works in a similar scenario you've described:
>
> https://docs.oracle.com/goldengate/bd123210/gg-bd/GADBD/using-hbase-handler.htm#GADBD-GUID-1A9BA580-628B-48BD-9DC0-C3DF9722E0FB
>
> They provide an option to generate timestamp for hbase on a client side -
> which is what I suggested earlier. In your case, you would need to build
> this logic in NiFi. Still think op_ts,pos combo should give you a proper
> ordering of events (so events sorted by op_ts and then by pos). When you
> can come up with a rule to increment actual timestamp for hbase by a
> millisecond, like Oracle does with their Hbase handler.
>
> Really interested what you end up doing, please share once you come up
> with a solution.
>
> Boris
>
> On Wed, Nov 7, 2018 at 7:53 AM Boris  wrote:
>
>> Sorry I meant RBA.GG has a bunch of tokens you can add to your json file
>> - you can even create your own. POS should be good and if op_ts does not
>> work for you, why not to generate your own timestamp using POS? (Now()
>> expression). You also add another token that identifies transaction
>> sequence number and order opts and then by transaction sequence number.
>> Please share what you will end up doing
>>
>> On Tue, Nov 6, 2018, 01:55 Faisal Durrani >
>>> Hi Boris,
>>>
>>> Thank you for your reply.  Let me try explaining my data flow in detail.
>>> I am receiving the GG transaction as JSON format through Kafka so I can
>>> only use the fields provided by the Kafka handler of GG ( Json plug-gable
>>> format). I think you meant RBA value instead of rbc. I don't think we can
>>> receive the RBA value in Json but there is a field called POS which is a
>>> concatenation of source trail file number and RBA. So probably we can use
>>> that in the Enforce order processor. But if we don't use the timestamp
>>> information then we will run into the Hbase versioning issue.  The idea
>>> behind using the Op_ts was to version each row of our target table and also
>>> help us with the DML operation. We are using the PK of each table as the
>>> row_key of target Hbase table. Every new transaction(updated/delete) of the
>>> table is logically inserted as a new row but since its the same pkey so we
>>> can see the version each row. The operation with the highest timestamp is
>>> the valid state of the row. I tested the enforce order processor with the
>>> kafka offset and it skips all the records which arrive later then the older
>>> offset which i don't understand why. If i decide to use the enforce order
>>> on POS and use default timestamp in hbase then it will skip ordering the
>>> the kafka messages arriving late and that will cause the unsync. In
>>> addition to this I've read the Enforce order only orders the row on a
>>> single node while we have a 5 node cluster. So I'm not sure how do i
>>> combine all the flow files together on a single node? ( I know how to
>>> distribute them i.e is by using S2S-RPG)
>>>
>>> I hope i have been able to explain my situation. Kindly let me know of
>>> your views on this.
>>>
>>> Regards,
>>> Faisal
>>>
>>>
>>> On Mon, Nov 5, 2018 at 11:18 PM Boris Tyukin 
>>> wrote:
>>>
 Hi Faisal, I am not Timothy, but you raise an interesting problem we
 might face soon as well. I did not expect the situation you described and I
 thought transaction time would be different.

 Our intent was to use op_ts to enforce order but another option is to
 use GG rbc value or  oracle rowscn value  - did you consider them? GG
 RBC should identify unique transaction and within every transaction, you
 can also get operation# within a transaction. Also you can get trail file#
 and trail file position. GG is really powerful and gives you a bunch of
 data elements that you can enable on your message.


 https://docs.oracle.com/goldengate/1212/gg-winux/GWUAD/wu_fileformats.htm#GWUAD735

 Logdump tool is an awesome tool to look into your trail files and see
 what's in there.

 Boris



 On Mon, Nov 5, 2018 at 3:07 AM Faisal Durrani 
 wrote:

> Hi Timothy ,
>
> Hope you are doing well. We have been using your data flow(
> https://community.hortonworks.com/content/kbentry/155527/ingesting-golden-gate-records-from-apache-kafka-an.html#
> )
> with slight modifications to store the data in 

Re: [DISCUSS] Beam

2016-11-23 Thread Timothy Spann
Reactive Streams is very cool, but only used by a few frameworks.

https://en.wikipedia.org/wiki/Reactive_Streams

Its adoption is slow, but starting to grow.  I’ve worked with RxJava and 
Project Reactor.   Once Spring 5 is out for a bit, then it will probably time 
to look at the Reactive Streams API.



Re: [DISCUSS] Promoting on social channels

2016-11-04 Thread Timothy Spann
I am very active on Twitter and LinkedIn, I’ll do this.

Tim Spann  
https://community.hortonworks.com/users/9304/tspann.html
@PaaSDeV  http://www.meetup.com/futureofdata-princeton/

 




Re: [ANNOUNCE] Welcome Joey Frazee as new Apache Streams Committer and PPMC member

2016-11-01 Thread Timothy Spann
Congrats!

Tim Spann  
Solutions Engineer

 



Re: [DISCUSS] Coding style conventions + auto-formatting

2016-10-28 Thread Timothy Spann
>

+1 for Google.  Easy to import into any IDE and enforced with checkstyle.


>
> -
> 
https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/styleguide.md
>
> - https://opennlp.apache.org/code-conventions.html
>
> I'm sure there are more. Thoughts?
>
> Whatever happens, I'd suggest it just get enforced via Maven so we don't
> have to kick PRs back and forth or keep discussing it. To that effect I
> opened https://issues.apache.org/jira/browse/STREAMS-449

https://maven.apache.org/plugins-archives/maven-checkstyle-plugin-2.16/

And can check that from IDEs as well.

Tim Spann 



Re: Tech debt cleanup for Future releases

2016-10-24 Thread Timothy Spann
That makes a lot of sense.

Maybe run FindBugs, PMD and CheckStyle.



AVRO vs Parquet

2016-03-02 Thread Timothy Spann
Which format is the best format for SparkSQL adhoc queries and general data 
storage?

There are lots of specialized cases, but generally accessing some but not all 
the available columns with a reasonable subset of the data.

I am learning towards Parquet as it has great support in Spark.

I also have to consider any file on HDFS may be accessed from other tools like 
Hive, Impala, HAWQ.

Suggestions?
—
airis.DATA
Timothy Spann, Senior Solutions Architect
C: 609-250-5894
http://airisdata.com/
http://meetup.com/nj-datascience




Re: Spark Certification

2016-02-11 Thread Timothy Spann
I was wondering that as well.

Also is it fully updated for 1.6?

Tim
http://airisdata.com/
http://sparkdeveloper.com/


From: naga sharathrayapati 
>
Date: Wednesday, February 10, 2016 at 11:36 PM
To: "user@spark.apache.org" 
>
Subject: Spark Certification

Hello All,

I am planning on taking Spark Certification and I was wondering If one has to 
be well equipped with  MLib & GraphX as well or not ?

Please advise

Thanks


Please Add Our Meetup to the Spark Meetup List

2016-02-05 Thread Timothy Spann
Our meetup is

NJ Data Science - Apache Spark<http://www.meetup.com/nj-datascience/>
http://www.meetup.com/nj-datascience
Princeton, NJ


Past Meetups:

Spark Streaming<http://www.meetup.com/nj-datascience/events/222851584/> by 
Prasad Sripathi, airisDATA<http://www.mongodbdev.com/our-team/>
August 13, 2015

ELK Stack and Spark<http://www.meetup.com/nj-datascience/events/222711861/>
June 30, 2015

Spark Hands-on Intro 
workshop<http://www.meetup.com/nj-datascience/events/222661453/> by Kristina 
Rogale Plazonic, airisDATA
<http://www.mongodbdev.com/our-team/>June 18, 2015

Graph Analytics – Titan and 
Cassandra<http://www.meetup.com/nj-datascience/events/222637527/> by Isaac 
Rieksts  (Slides<http://www.slideshare.net/irieksts/titan-tinkerpopmeetup>)
June 4, 2015

Machine Learning 
Fundamentals<http://www.meetup.com/nj-datascience/events/222079398/>  by 
SriSatish Ambati, H2O ML Platform
May 14, 2015  
Slides<https://github.com/h2oai/h2o-meetups/tree/master/2015_05_14_H2O_Overview>


—
airis.DATA
Timothy Spann, Senior Solutions Engineer
C: 609-250-5894