Re: Community chat?

2021-02-24 Thread Marta Paes Moreira
Ah! That freenode channel dates back to...2014? The community is not
maintaining any channels other than the Mailing List (and Stack Overflow),
currently.

But this is something we're looking into, as it's coming up more and more
frequently. Would Slack be your first pick? Or would something async but
easier to interact with also work, like a Discourse forum?

Thanks for bringing this up!

Marta



On Mon, Feb 22, 2021 at 10:03 PM Yuval Itzchakov  wrote:

> A dedicated Slack would be awesome.
>
> On Mon, Feb 22, 2021, 22:57 Sebastián Magrí  wrote:
>
>> Is there any chat from the community?
>>
>> I saw the freenode channel but it's pretty dead.
>>
>> A lot of the time a more chat alike venue where to discuss stuff
>> synchronously or just share ideas turns out very useful and estimulates the
>> community.
>>
>> --
>> Sebastián Ramírez Magrí
>>
>


Re: Size of state for any known production use case

2020-02-13 Thread Marta Paes Moreira
Hi, Reva.

If you are looking for the maximum known state size, I believe Alibaba is
using Flink at the largest scale in production [1].

There are also other examples of variable scale scattered across Flink
Forward talks [2]. In particular, this Netflix talk [3] should be
interesting to you.

Marta

[1]
https://www.itnextsummit.com/wp-content/uploads/2019/11/Stephan_Ewen_Stream_Processing_Beyond_Streaming.pdf
(Slide
3)
[2] https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA/videos
[3] https://www.youtube.com/watch?v=2C44mUPlx5o

On Wed, Feb 12, 2020 at 10:42 PM RKandoji  wrote:

> Hi Team,
>
> I've done a POC using Flink and planning to give a presentation about my
> learnings and share the benefits of using Flink.
>
> I understand that companies are using Flink to handle Tera Bytes of state,
> but it would be great if you could point me to any reference of a company
> using Flink production for a known amount of state. Or any other related
> links where I can get these details?
>
> Basically I want to provide the known maximum limit of state that can be
> stored. This is needed because my use case requires performing stream joins
> on unbounded data (although data is unbounded, its not going to be super
> huge like 10TB)
>
>
> Thanks,
> Reva
>


Re: Alink and Flink ML

2020-03-09 Thread Marta Paes Moreira
Hi, Flavio.

Indeed, Becket is the best person to answer this question, but as far as I
understand the idea is that Alink will be contributed back to Flink in the
form of a refactored Flink ML library (sitting on top of the Table API)
[1]. You can follow the progress of these efforts by tracking FLIP-39 [2].

[1] https://developpaper.com/why-is-flink-ai-worth-looking-forward-to/
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs

On Tue, Mar 3, 2020 at 2:02 PM Gary Yao  wrote:

> Hi Flavio,
>
> I am looping in Becket (cc'ed) who might be able to answer your question.
>
> Best,
> Gary
>
> On Tue, Mar 3, 2020 at 12:19 PM Flavio Pompermaier 
> wrote:
>
>> Hi to all,
>> since Alink has been open sourced, is there any good reason to keep both
>> Flink ML and Alink?
>> From what I understood Alink already contains the best ML implementation
>> available for Flink..am I wrong?
>> Maybe it could make sense to replace the current Flink ML with that of
>> Alink..or is that impossible?
>>
>> Cheers,
>> Flavio
>>
>


Re: FLINK SQL中时间戳怎么处理处理

2020-03-23 Thread Marta Paes Moreira
Hi, 吴志勇.

Please use the *user-zh* mailing list (in CC) to get support in Chinese.

Thanks!

Marta

On Mon, Mar 23, 2020 at 8:35 AM 吴志勇 <1154365...@qq.com> wrote:

> 如题:
> 我向kafka中输出了json格式的数据
> {"id":5,"price":40,"timestamp":1584942626828,"type":"math"}
> {"id":2,"price":70,"timestamp":1584942629638,"type":"math"}
> {"id":2,"price":70,"timestamp":1584942634951,"type":"math"}
> 
> 其中timestamp字段是13位时间戳,对应的SQL表中应该怎么处理成时间格式呢?
>   - name: bookpojo
> type: source-table
> connector:
>   property-version: 1
>   type: kafka
>   version: "universal"
>   topic: pojosource
>   startup-mode: earliest-offset
>   properties:
> zookeeper.connect: localhost:2181
> bootstrap.servers: localhost:9092
> group.id: testGroup
> format:
>   property-version: 1
>   type: json
>   schema: "ROW"
> schema:
>   - name: id
> data-type: INT
>   - name: type
> data-type: STRING
>   - name: price
> data-type: INT
>   - name: timestamp
> data-type: TIMESTAMP(3)
>
> 上述配置,好像有问题。
>
> 我在官网中找到这样一句说明:
> 字符串和时间类型:时间类型必须根据Java SQL时间格式进行格式化,并以毫秒为单位。例如: 
> 2018-01-01日期,20:43:59时间和2018-01-01
> 20:43:59.999时间戳。
> 时间一定得是字符串类型且带毫秒吗?
>
> 谢谢。
>
>


Re: subscribe messages

2020-03-25 Thread Marta Paes Moreira
Hi, Jianhui!

To subscribe, please send an e-mail to user-subscr...@flink.apache.org instead.
For more information on mailing list subscriptions, check [1].

[1] https://flink.apache.org/community.html#mailing-lists

On Wed, Mar 25, 2020 at 10:07 AM Jianhui <980513...@qq.com> wrote:

>
>


Re: [Third-party Tool] Flink memory calculator

2020-04-01 Thread Marta Paes Moreira
Hey, Yangze.

I'd like to suggest that you submit this tool to Flink Community Pages [1].
That way it can get more exposure and it'll be easier for users to find it.

Thanks for your contribution!

[1] https://flink-packages.org/

On Tue, Mar 31, 2020 at 9:09 AM Yangze Guo  wrote:

> Hi, there.
>
> In the latest version, the calculator supports dynamic options. You
> could append all your dynamic options to the end of "bin/calculator.sh
> [-h]".
> Since "-tm" will be deprecated eventually, please replace it with
> "-Dtaskmanager.memory.process.size=".
>
> Best,
> Yangze Guo
>
> On Mon, Mar 30, 2020 at 12:57 PM Xintong Song 
> wrote:
> >
> > Hi Jeff,
> >
> > I think the purpose of this tool it to allow users play with the memory
> configurations without needing to actually deploy the Flink cluster or even
> have a job. For sanity checks, we currently have them in the start-up
> scripts (for standalone clusters) and resource managers (on K8s/Yarn/Mesos).
> >
> > I think it makes sense do the checks earlier, i.e. on the client side.
> But I'm not sure if JobListener is the right place. IIUC, JobListener is
> invoked before submitting a specific job, while the mentioned checks
> validate Flink's cluster level configurations. It might be okay for a job
> cluster, but does not cover the scenarios of session clusters.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Mar 30, 2020 at 12:03 PM Yangze Guo  wrote:
> >>
> >> Thanks for your feedbacks, @Xintong and @Jeff.
> >>
> >> @Jeff
> >> I think it would always be good to leverage exist logic in Flink, such
> >> as JobListener. However, this calculator does not only target to check
> >> the conflict, it also targets to provide the calculating result to
> >> user before the job is actually deployed in case there is any
> >> unexpected configuration. It's a good point that we need to parse the
> >> dynamic configs. I prefer to parse the dynamic configs and cli
> >> commands in bash instead of adding hook in JobListener.
> >>
> >> Best,
> >> Yangze Guo
> >>
> >> On Mon, Mar 30, 2020 at 10:32 AM Jeff Zhang  wrote:
> >> >
> >> > Hi Yangze,
> >> >
> >> > Does this tool just parse the configuration in flink-conf.yaml ?
> Maybe it could be done in JobListener [1] (we should enhance it via adding
> hook before job submission), so that it could all the cases (e.g.
> parameters coming from command line)
> >> >
> >> > [1]
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java#L35
> >> >
> >> >
> >> > Yangze Guo  于2020年3月30日周一 上午9:40写道:
> >> >>
> >> >> Hi, Yun,
> >> >>
> >> >> I'm sorry that it currently could not handle it. But I think it is a
> >> >> really good idea and that feature would be added to the next version.
> >> >>
> >> >> Best,
> >> >> Yangze Guo
> >> >>
> >> >> On Mon, Mar 30, 2020 at 12:21 AM Yun Tang  wrote:
> >> >> >
> >> >> > Very interesting and convenient tool, just a quick question: could
> this tool also handle deployment cluster commands like "-tm" mixed with
> configuration in `flink-conf.yaml` ?
> >> >> >
> >> >> > Best
> >> >> > Yun Tang
> >> >> > 
> >> >> > From: Yangze Guo 
> >> >> > Sent: Friday, March 27, 2020 18:00
> >> >> > To: user ; user...@flink.apache.org <
> user...@flink.apache.org>
> >> >> > Subject: [Third-party Tool] Flink memory calculator
> >> >> >
> >> >> > Hi, there.
> >> >> >
> >> >> > In release-1.10, the memory setup of task managers has changed a
> lot.
> >> >> > I would like to provide here a third-party tool to simulate and get
> >> >> > the calculation result of Flink's memory configuration.
> >> >> >
> >> >> >  Although there is already a detailed setup guide[1] and migration
> >> >> > guide[2] officially, the calculator could further allow users to:
> >> >> > - Verify if there is any conflict in their configuration. The
> >> >> > calculator is more lightweight than starting a Flink cluster,
> >> >> > especially when running Flink on Yarn/Kubernetes. User could make
> sure
> >> >> > their configuration is correct locally before deploying it to
> external
> >> >> > resource managers.
> >> >> > - Get all of the memory configurations before deploying. User may
> set
> >> >> > taskmanager.memory.task.heap.size and
> taskmanager.memory.managed.size.
> >> >> > But they also want to know the total memory consumption of Flink.
> With
> >> >> > this tool, users could get all of the memory configurations they
> are
> >> >> > interested in. If anything is unexpected, they would not need to
> >> >> > re-deploy a Flink cluster.
> >> >> >
> >> >> > The repo link of this tool is
> >> >> > https://github.com/KarmaGYZ/flink-memory-calculator. It reuses the
> >> >> > BashJavaUtils.jar of Flink and ensures the calculation result is
> >> >> > exactly the same as your Flink dist. For more details, please take
> a
> >> >> > look at the README.
> >> >> >
> >> >> > Any feedback or suggestion is welcomed!
> >> >> >
> >> >> > [1]
> https:/

Re: Anomaly detection Apache Flink

2020-04-03 Thread Marta Paes Moreira
Hi, Salvador.

You can find some more examples of real-time anomaly detection with Flink
in these presentations from Microsoft [1] and Salesforce [2] at Flink
Forward. This blogpost [3] also describes how to build that kind of
application using Kinesis Data Analytics (based on Flink).

Let me know if these resources help!

[1] https://www.youtube.com/watch?v=NhOZ9Q9_wwI
[2] https://www.youtube.com/watch?v=D4kk1JM8Kcg
[3]
https://towardsdatascience.com/real-time-anomaly-detection-with-aws-c237db9eaa3f

On Fri, Apr 3, 2020 at 11:37 AM Salvador Vigo  wrote:

> Hi there,
> I am working in an approach to make some experiments related with anomaly
> detection in real time with Apache Flink. I would like to know if there are
> already some open issues in the community.
> The only example I found was the one of Scott Kidder
>  and the Mux platform, 2017. If any
> one is already working in this topic or know some related work or
> publication I will be grateful.
> Best,
>


Re: Anomaly detection Apache Flink

2020-04-03 Thread Marta Paes Moreira
Forgot to mention that you might also want to have a look into Flink CEP
[1], Flink's library for Complex Event Processing.

It allows you to define and detect event patterns over streams, which can
come in pretty handy for anomaly detection.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/cep.html

On Fri, Apr 3, 2020 at 6:08 PM Nienhuis, Ryan  wrote:

> I would also have a look at the random cut forest algorithm. This is the
> base algorithm that is used for anomaly detection in several AWS services
> (Quicksight, Kinesis Data Analytics, etc.). It doesn’t help with getting it
> working with Flink, but may be a good place to start for an algorithm.
>
>
>
> https://github.com/aws/random-cut-forest-by-aws
>
>
>
> Ryan
>
>
>
> *From:* Marta Paes Moreira 
> *Sent:* Friday, April 3, 2020 5:25 AM
> *To:* Salvador Vigo 
> *Cc:* user 
> *Subject:* RE: [EXTERNAL] Anomaly detection Apache Flink
>
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> Hi, Salvador.
>
> You can find some more examples of real-time anomaly detection with Flink
> in these presentations from Microsoft [1] and Salesforce [2] at Flink
> Forward. This blogpost [3] also describes how to build that kind of
> application using Kinesis Data Analytics (based on Flink).
>
> Let me know if these resources help!
>
> [1] https://www.youtube.com/watch?v=NhOZ9Q9_wwI
> [2] https://www.youtube.com/watch?v=D4kk1JM8Kcg
> [3]
> https://towardsdatascience.com/real-time-anomaly-detection-with-aws-c237db9eaa3f
>
>
>
> On Fri, Apr 3, 2020 at 11:37 AM Salvador Vigo 
> wrote:
>
> Hi there,
>
> I am working in an approach to make some experiments related with anomaly
> detection in real time with Apache Flink. I would like to know if there are
> already some open issues in the community.
>
> The only example I found was the one of Scott Kidder
> <https://mux.com/team/scott-kidder> and the Mux platform, 2017. If any
> one is already working in this topic or know some related work or
> publication I will be grateful.
>
> Best,
>
>


Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Marta Paes Moreira
Thank you for managing the release, Gordon — you did a tremendous job! And
to everyone else who worked on pushing it through.

Really excited about the new use cases that StateFun 2.0 unlocks for Flink
users and beyond!

Marta

On Tue, Apr 7, 2020 at 4:47 PM Hequn Cheng  wrote:

> Thanks a lot for the release and your great job, Gordon!
> Also thanks to everyone who made this release possible!
>
> Best,
> Hequn
>
> On Tue, Apr 7, 2020 at 8:58 PM Tzu-Li (Gordon) Tai 
> wrote:
>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink Stateful Functions 2.0.0.
>>
>> Stateful Functions is an API that simplifies building distributed
>> stateful applications.
>> It's based on functions with persistent state that can interact
>> dynamically with strong consistency guarantees.
>>
>> Please check out the release blog post for an overview of the release:
>> https://flink.apache.org/news/2020/04/07/release-statefun-2.0.0.html
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Maven artifacts for Stateful Functions can be found at:
>> https://search.maven.org/search?q=g:org.apache.flink%20statefun
>>
>> Python SDK for Stateful Functions published to the PyPI index can be
>> found at:
>> https://pypi.org/project/apache-flink-statefun/
>>
>> Official Docker image for building Stateful Functions applications is
>> currently being published to Docker Hub.
>> Dockerfiles for this release can be found at:
>> https://github.com/apache/flink-statefun-docker/tree/master/2.0.0
>> Progress for creating the Docker Hub repository can be tracked at:
>> https://github.com/docker-library/official-images/pull/7749
>>
>> The full release notes are available in Jira:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>>
>> Cheers,
>> Gordon
>>
>


Re: How to use OpenTSDB as Source?

2020-04-22 Thread Marta Paes Moreira
Hi, Lucas.

There was a lot of refactoring in the Table API / SQL in the last release,
so the user experience is not ideal at the moment — sorry for that.

You can try using the DDL syntax to create your table, as shown in [1,2].
I'm CC'ing Timo and Jark, who should be able to help you further.

Marta

[1] https://flink.apache.org/news/2020/02/20/ddl.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/create.html

On Tue, Apr 21, 2020 at 7:02 PM Lucas Kinne <
lucas.ki...@stud-mail.uni-wuerzburg.de> wrote:

> Hey guys,
>
> in a university project we are storing our collected sensor data in an 
> OpenTSDB
> database.
> I am now trying to use this database as a source in Apache Flink, but I
> can't seem to figure out how to do it.
>
> I have seen that there is no existing connector for this Database, but I
> read in the docs
> 
> that is is possible to implement a custom (Batch/Streaming)TableSource.
> There is a Java client for OpenTSDB
> , which
> could be used for that.
>
> So I created a new Java Class "OpenTSDBTableSource" that implements
> "StreamTableSource", "DefinedProctimeAttribute", "DefinedRowtimeAttribute"
> and "LookupableTableSource", as suggested in the docs.
> However, I have no idea how to register this TableSource. The
> "StreamExecutionEnvironment.addSource" requires a "SourceFunction"
> parameter instead of my "TableSource" and the
> "StreamTableEnvironment.registerTableSource"-Method is deprecated. There is
> a link to the topic "register a TableSource" on linked docs page, but the
> link seems to be dead, hence I found no other method on how to register a
> TableSource.
>
> I could also write a "SourceFunction" myself, pull the OpenTSDB database
> in there and return the DataStream from the fetched Collection, but I am
> not sure whether this is an efficient way.
> And if I did it this "manual" way, how do I avoid pulling the whole
> database everytime?
>
> Any help is much appreciated, even if it is just a small pointer to the
> right direction.
>
> Thanks in advance!
>
> Sincerely,
> Lucas
>


Re: Task Assignment

2020-04-23 Thread Marta Paes Moreira
Hi, Navneeth.

If you *key* your stream using stream.keyBy(…), this will logically split
your input and all the records with the same key will be processed in the
same operator instance. This is the default behavior in Flink for keyed
streams and transparently handled.

You can read more about it in the documentation [1].

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#keyed-state-and-operator-state

On Thu, Apr 23, 2020 at 7:44 AM Navneeth Krishnan 
wrote:

> Hi All,
>
> Is there a way for an upstream operator to know how the downstream
> operator tasks are assigned? Basically I want to group my messages to be
> processed on slots in the same node based on some key.
>
> Thanks
>


Re: Flink Forward 2020 Recorded Sessions

2020-04-23 Thread Marta Paes Moreira
Hi, Sivaprasanna.

The talks will be up on Youtube sometime after the conference ends.

Today, the starting schedule is different (9AM CEST / 12:30PM IST / 3PM
CST) and more friendly to Europe, India and China. Hope you manage to join
some sessions!

Marta

On Fri, 24 Apr 2020 at 06:58, Sivaprasanna 
wrote:

> Hello,
>
> I had registered for the Flink Forward 2020 and had attended couple of
> sessions but due to the odd timings and overlapping sessions on the same
> slot, I wasn't able to attend some interesting talks. I have received mails
> with link to rewatch some 2-3 webinars but not all (that had happened yet).
> Where can I find the recorded sessions?
>
> Thanks,
> Sivaprasanna
>


Re: Task Assignment

2020-04-27 Thread Marta Paes Moreira
Sorry — I didn't understand you were dealing with multiple keys.

In that case, I'd recommend you read about key-group assignment [1] and
check the KeyGroupRangeAssignment class [2].

Key-groups are assigned to parallel tasks as ranges before the job is
started — this is also a well-defined behaviour in Flink, with implications
in state reassignment on rescaling. I'm afraid that if you try to hardwire
this behaviour into your code, the job might not be transparently
rescalable anymore.

[1] https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html
<https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/state/state.html#keyed-state>
[2]
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.java


On Fri, Apr 24, 2020 at 7:10 AM Navneeth Krishnan 
wrote:

> Hi Marta,
>
> Thanks for you response. What I'm looking for is something like data
> localization. If I have one TM which is processing a set of keys, I want to
> ensure all keys of the same type goes to the same TM rather than using
> hashing to find the downstream slot. I could use a common key to do this
> but I would have to parallelize as much as possible since the number of
> incoming messages is too large to narrow down to a single key and
> processing it.
>
> Thanks
>
> On Thu, Apr 23, 2020 at 2:02 AM Marta Paes Moreira 
> wrote:
>
>> Hi, Navneeth.
>>
>> If you *key* your stream using stream.keyBy(…), this will logically
>> split your input and all the records with the same key will be processed in
>> the same operator instance. This is the default behavior in Flink for keyed
>> streams and transparently handled.
>>
>> You can read more about it in the documentation [1].
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#keyed-state-and-operator-state
>>
>> On Thu, Apr 23, 2020 at 7:44 AM Navneeth Krishnan <
>> reachnavnee...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Is there a way for an upstream operator to know how the downstream
>>> operator tasks are assigned? Basically I want to group my messages to be
>>> processed on slots in the same node based on some key.
>>>
>>> Thanks
>>>
>>


Re: Flink Forward 2020 Recorded Sessions

2020-04-28 Thread Marta Paes Moreira
Hi again,

You can find the first wave of recordings on Youtube already [1]. The
remainder will come over the course of the next few weeks.

[1] https://www.youtube.com/playlist?list=PLDX4T_cnKjD0ngnBSU-bYGfgVv17MiwA7

On Fri, Apr 24, 2020 at 3:23 PM Sivaprasanna 
wrote:

> Cool. Thanks for the information.
>
> On Fri, 24 Apr 2020 at 11:20 AM, Marta Paes Moreira 
> wrote:
>
>> Hi, Sivaprasanna.
>>
>> The talks will be up on Youtube sometime after the conference ends.
>>
>> Today, the starting schedule is different (9AM CEST / 12:30PM IST / 3PM
>> CST) and more friendly to Europe, India and China. Hope you manage to join
>> some sessions!
>>
>> Marta
>>
>> On Fri, 24 Apr 2020 at 06:58, Sivaprasanna 
>> wrote:
>>
>>> Hello,
>>>
>>> I had registered for the Flink Forward 2020 and had attended couple of
>>> sessions but due to the odd timings and overlapping sessions on the same
>>> slot, I wasn't able to attend some interesting talks. I have received mails
>>> with link to rewatch some 2-3 webinars but not all (that had happened yet).
>>> Where can I find the recorded sessions?
>>>
>>> Thanks,
>>> Sivaprasanna
>>>
>>


Re: Python UDF from Java

2020-04-30 Thread Marta Paes Moreira
Hi, Flavio.

Extending the scope of Python UDFs is described in FLIP-106 [1, 2] and is
planned for the upcoming 1.11 release, according to Piotr's last update.

Hope this addresses your question!

Marta

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-106-Support-Python-UDF-in-SQL-Function-DDL-td38107.html

On Thu, Apr 30, 2020 at 11:30 PM Flavio Pompermaier 
wrote:

> Hi to all,
> is it possible to run a Python UDF from a Java job (using Table API or
> SQL)?
> Is there any reference?
>
> Best,
> Flavio
>


Re: Webinar: Unlocking the Power of Apache Beam with Apache Flink

2020-05-27 Thread Marta Paes Moreira
Thanks for sharing, Aizhamal - it was a great webinar!

Marta

On Wed, 27 May 2020 at 23:17, Aizhamal Nurmamat kyzy 
wrote:

> Thank you all for attending today's session! Here is the YT recording:
> https://www.youtube.com/watch?v=ZCV9aRDd30U
> And link to the slides:
> https://github.com/aijamalnk/beam-learning-month/blob/master/Unlocking%20the%20Power%20of%20Apache%20Beam%20with%20Apache%20Flink.pdf
>
> On Tue, May 26, 2020 at 8:32 AM Aizhamal Nurmamat kyzy <
> aizha...@apache.org> wrote:
>
>> Hi all,
>>
>> Please join our webinar this Wednesday at 10am PST/5:00pm GMT/1:00pm EST
>> where Max Michels - PMC member for Apache Beam and Apache Flink, will
>> deliver a talk about leveraging Apache Beam for large-scale stream and
>> batch analytics with Apache Flink.
>>
>> You can register via this link:
>> https://learn.xnextcon.com/event/eventdetails/W20052710
>>
>> Here is the short description of the talk:
>> ---
>> Apache Beam is a framework for writing stream and batch processing
>> pipelines using multiple languages such as Java, Python, SQL, or Go. Apache
>> Beam does not come with an execution engine of its own. Instead, it defers
>> the execution to its Runners which translate Beam pipelines for any
>> supported execution engine. Thus, users have complete control over the
>> language and the execution engine they use, without having to rewrite their
>> code.
>> In this talk, we will look at running Apache Beam pipelines with Apache
>> Flink. We will explain the concepts behind Apache Beams portability
>> framework for multi-language support, and then show how to get started
>> running Java, Python, and SQL pipelines.
>> 
>> Links to the slides and recordings of this and previous webinars you can
>> find here: https://github.com/aijamalnk/beam-learning-month
>>
>> Hope y'all are safe,
>> Aizhamal
>>
>


Re: Installing Ververica, unable to write to file system

2020-05-28 Thread Marta Paes Moreira
Hi, Charlie.

This is not the best place for questions about Ververica Platform CE.
Please use community-edit...@ververica.com instead — someone will be able
to support you there!

If you have any questions related to Flink itself, feel free to reach out
to this mailing list again in the future.

Thanks,

Marta

On Wed, May 27, 2020 at 11:37 PM Corrigan, Charlie <
charlie.corri...@nordstrom.com> wrote:

> Hello, I’m trying to install Ververica (community edition for a simple poc
> deploy) via helm using these directions
> , but the pod is
> failing with the following error:
>
>
>
> ```
>
> org.springframework.context.ApplicationContextException: Unable to start
> web server; nested exception is
> org.springframework.boot.web.server.WebServerException: Unable to create
> tempDir. java.io.tmpdir is set to /tmp
>
> ```
>
>
>
> By default, our file system is immutable in k8s. Usually for this error,
> we’d mount an emptyDir volume. I’ve tried to do that in ververica’s
> values.yaml file, but I might be configuring it incorrectly. Here is the
> relevant portion of the values.yaml. I can include the entire file if it’s
> helpful. Any advice on how to alter these values or proceed with the
> ververica installation with a read only file system?
>
>
>
> volumes:
>   - name: tmp
> emptyDir: {}
>
>
>
> *## ## Container configuration for the appmanager component ## *appmanager
> :
>   image:
> repository: registry.ververica.com/v2.1/vvp-appmanager
> tag: 2.1.0
> pullPolicy: Always
> volumeMounts:
>   - mountPath: /tmp
> name: tmp
>   resources:
> limits:
>   cpu: 1000m
>   memory: 1Gi
> requests:
>   cpu: 250m
>   memory: 1Gi
>
>   artifactFetcherTag: 2.1.0
>
>
>


Re: Is Flink HIPAA certified

2020-06-30 Thread Marta Paes Moreira
Hi, Prasanna.

We're not aware of any Flink users in the US healthcare space (as far as I
know).

I'm looping in Ryan from AWS, as he might be able to tell you more about
how you can become HIPAA-compliant with Flink [1].

Marta

[1]
https://docs.aws.amazon.com/kinesisanalytics/latest/java/akda-java-compliance.html

On Sat, Jun 27, 2020 at 9:41 AM Prasanna kumar <
prasannakumarram...@gmail.com> wrote:

> Hi Community ,
>
> Could anyone let me know if Flink is used in US healthcare tech space ?
>
> Thanks,
> Prasanna.
>


Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-07-31 Thread Marta Paes Moreira
Hi, Jincheng!

Thanks for creating this detailed FLIP, it will make a big difference in
the experience of Python developers using Flink. I'm interested in
contributing to this work, so I'll reach out to you offline!

Also, thanks for sharing some information on the adoption of PyFlink, it's
great to see that there are already production users.

Marta

On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang  wrote:

> Hi Jincheng,
>
> Thanks a lot for bringing up this discussion and the proposal.
>
> Big +1 for improving the structure of PyFlink doc.
>
> It will be very friendly to give PyFlink users a unified entrance to learn
> PyFlink documents.
>
> Best,
> Xingbo
>
> Dian Fu  于2020年7月31日周五 上午11:00写道:
>
>> Hi Jincheng,
>>
>> Thanks a lot for bringing up this discussion and the proposal. +1 to
>> improve the Python API doc.
>>
>> I have received many feedbacks from PyFlink beginners about
>> the PyFlink doc, e.g. the materials are too few, the Python doc is mixed
>> with the Java doc and it's not easy to find the docs he wants to know.
>>
>> I think it would greatly improve the user experience if we can have one
>> place which includes most knowledges PyFlink users should know.
>>
>> Regards,
>> Dian
>>
>> 在 2020年7月31日,上午10:14,jincheng sun  写道:
>>
>> Hi folks,
>>
>> Since the release of Flink 1.11, users of PyFlink have continued to grow.
>> As far as I know there are many companies have used PyFlink for data
>> analysis, operation and maintenance monitoring business has been put into
>> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).  According to
>> the feedback we received, current documentation is not very friendly to
>> PyFlink users. There are two shortcomings:
>>
>> - Python related content is mixed in the Java/Scala documentation, which
>> makes it difficult for users who only focus on PyFlink to read.
>> - There is already a "Python Table API" section in the Table API document
>> to store PyFlink documents, but the number of articles is small and the
>> content is fragmented. It is difficult for beginners to learn from it.
>>
>> In addition, FLIP-130 introduced the Python DataStream API. Many
>> documents will be added for those new APIs. In order to increase the
>> readability and maintainability of the PyFlink document, Wei Zhong and me
>> have discussed offline and would like to rework it via this FLIP.
>>
>> We will rework the document around the following three objectives:
>>
>> - Add a separate section for Python API under the "Application
>> Development" section.
>> - Restructure current Python documentation to a brand new structure to
>> ensure complete content and friendly to beginners.
>> - Improve the documents shared by Python/Java/Scala to make it more
>> friendly to Python users and without affecting Java/Scala users.
>>
>> More detail can be found in the FLIP-133:
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>>
>> Best,
>> Jincheng
>>
>> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>>
>>
>>


Re: Debezium Flink EMR

2020-08-21 Thread Marta Paes Moreira
Hi, Rex.

Part of what enabled CDC support in Flink 1.11 was the refactoring of the
table source interfaces (FLIP-95 [1]), and the new ScanTableSource
[2], which allows to emit bounded/unbounded streams with insert, update and
delete rows.

In theory, you could consume data generated with Debezium as regular
JSON-encoded events before Flink 1.11 — there just wasn't a convenient way
to really treat it as "changelog". As a workaround, what you can do in
Flink 1.10 is process these messages as JSON and extract the "after" field
from the payload, and then apply de-duplication [3] to keep only the last
row.

The DDL for your source table would look something like:

CREATE TABLE tablename ( *... * after ROW(`field1` DATATYPE, `field2`
DATATYPE, ...) ) WITH ( 'connector' = 'kafka', 'format' = 'json', ... );
Hope this helps!

Marta

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/table/connector/source/ScanTableSource.html
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/queries.html#deduplication


On Fri, Aug 21, 2020 at 10:28 AM Chesnay Schepler 
wrote:

> @Jark Would it be possible to use the 1.11 debezium support in 1.10?
>
> On 20/08/2020 19:59, Rex Fenley wrote:
>
> Hi,
>
> I'm trying to set up Flink with Debezium CDC Connector on AWS EMR,
> however, EMR only supports Flink 1.10.0, whereas Debezium Connector arrived
> in Flink 1.11.0, from looking at the documentation.
>
> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-flink.html
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/formats/debezium.html
>
> I'm wondering what alternative solutions are available for connecting
> Debezium to Flink? Is there an open source Debezium connector that works
> with Flink 1.10.0? Could I potentially pull the code out for the 1.11.0
> Debezium connector and compile it in my project using Flink 1.10.0 api?
>
> For context, I plan on doing some fairly complicated long lived stateful
> joins / materialization using the Table API over data ingested from
> Postgres and possibly MySQL.
>
> Appreciate any help, thanks!
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com  |  BLOG   |
>  FOLLOW US   |  LIKE US
> 
>
>
>


Re: Debezium Flink EMR

2020-08-24 Thread Marta Paes Moreira
Yes — you'll get the full row in the payload; and you can also access the
change operation, which might be useful in your case.

About performance, I'm summoning Kurt and @Jark Wu  to the
thread, who will be able to give you a more complete answer and likely also
some optimization tips for your specific use case.

Marta

On Fri, Aug 21, 2020 at 8:55 PM Rex Fenley  wrote:

> Yup! This definitely helps and makes sense.
>
> The 'after' payload comes with all data from the row right? So essentially
> inserts and updates I can insert/replace data by pk and null values I just
> delete by pk, and then I can build out the rest of my joins like normal.
>
> Are there any performance implications of doing it this way that is
> different from the out-of-the-box 1.11 solution?
>
> On Fri, Aug 21, 2020 at 2:28 AM Marta Paes Moreira 
> wrote:
>
>> Hi, Rex.
>>
>> Part of what enabled CDC support in Flink 1.11 was the refactoring of the
>> table source interfaces (FLIP-95 [1]), and the new ScanTableSource
>> [2], which allows to emit bounded/unbounded streams with insert, update and
>> delete rows.
>>
>> In theory, you could consume data generated with Debezium as regular
>> JSON-encoded events before Flink 1.11 — there just wasn't a convenient way
>> to really treat it as "changelog". As a workaround, what you can do in
>> Flink 1.10 is process these messages as JSON and extract the "after" field
>> from the payload, and then apply de-duplication [3] to keep only the last
>> row.
>>
>> The DDL for your source table would look something like:
>>
>> CREATE TABLE tablename ( *... * after ROW(`field1` DATATYPE, `field2`
>> DATATYPE, ...) ) WITH ( 'connector' = 'kafka', 'format' = 'json', ... );
>> Hope this helps!
>>
>> Marta
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/table/connector/source/ScanTableSource.html
>> [3]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/queries.html#deduplication
>>
>>
>> On Fri, Aug 21, 2020 at 10:28 AM Chesnay Schepler 
>> wrote:
>>
>>> @Jark Would it be possible to use the 1.11 debezium support in 1.10?
>>>
>>> On 20/08/2020 19:59, Rex Fenley wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to set up Flink with Debezium CDC Connector on AWS EMR,
>>> however, EMR only supports Flink 1.10.0, whereas Debezium Connector arrived
>>> in Flink 1.11.0, from looking at the documentation.
>>>
>>> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-flink.html
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/formats/debezium.html
>>>
>>> I'm wondering what alternative solutions are available for connecting
>>> Debezium to Flink? Is there an open source Debezium connector that works
>>> with Flink 1.10.0? Could I potentially pull the code out for the 1.11.0
>>> Debezium connector and compile it in my project using Flink 1.10.0 api?
>>>
>>> For context, I plan on doing some fairly complicated long lived stateful
>>> joins / materialization using the Table API over data ingested from
>>> Postgres and possibly MySQL.
>>>
>>> Appreciate any help, thanks!
>>>
>>> --
>>>
>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>
>>>
>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>> <https://www.facebook.com/remindhq>
>>>
>>>
>>>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>


Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Marta Paes Moreira
Congrats, Dian!

On Thu, Aug 27, 2020 at 11:39 AM Yuan Mei  wrote:

> Congrats!
>
> On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang  wrote:
>
>> Congratulations Dian!
>>
>> Best,
>> Xingbo
>>
>> jincheng sun  于2020年8月27日周四 下午5:24写道:
>>
>>> Hi all,
>>>
>>> On behalf of the Flink PMC, I'm happy to announce that Dian Fu is now
>>> part of the Apache Flink Project Management Committee (PMC).
>>>
>>> Dian Fu has been very active on PyFlink component, working on various
>>> important features, such as the Python UDF and Pandas integration, and
>>> keeps checking and voting for our releases, and also has successfully
>>> produced two releases(1.9.3&1.11.1) as RM, currently working as RM to push
>>> forward the release of Flink 1.12.
>>>
>>> Please join me in congratulating Dian Fu for becoming a Flink PMC Member!
>>>
>>> Best,
>>> Jincheng(on behalf of the Flink PMC)
>>>
>>


Re: Debezium Flink EMR

2020-08-31 Thread Marta Paes Moreira
uot;,"optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"dbserver1.inventory.addresses.Envelope"},"payload":{"before":null,"after":{"id":18,"customer_id":1004,"street":"111
>>>>>> cool street","city":"Big
>>>>>> City","state":"California","zip":"9","type":"BILLING"},"source":{"version":"1.2.1.Final","connector":"mysql","name":"dbserver1","ts_ms":1598651432000,"snapshot":"false","db":"inventory","table":"addresses","server_id":223344,"gtid":null,"file":"mysql-bin.10","pos":369,"row":0,"thread":5,"query":null},"op":"c","ts_ms":1598651432407,"transaction":null}}'.
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.formats.json.debezium.DebeziumJsonDeserializationSchema.deserialize(DebeziumJsonDeserializationSchema.java:136)
>>>>>> ~[flink-json-1.11.1.jar:1.11.1]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.connectors.kafka.internals.KafkaDeserializationSchemaWrapper.deserialize(KafkaDeserializationSchemaWrapper.java:56)
>>>>>> ~[?:?]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:181)
>>>>>> ~[?:?]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:141)
>>>>>> ~[?:?]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:755)
>>>>>> ~[?:?]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
>>>>>> ~[flink-dist_2.12-1.11.1.jar:1.11.1]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
>>>>>> ~[flink-dist_2.12-1.11.1.jar:1.11.1]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:201)
>>>>>> ~[flink-dist_2.12-1.11.1.jar:1.11.1]
>>>>>> flink-jobmanager_1 | Caused by: java.lang.NullPointerException
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.formats.json.debezium.DebeziumJsonDeserializationSchema.deserialize(DebeziumJsonDeserializationSchema.java:115)
>>>>>> ~[flink-json-1.11.1.jar:1.11.1]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.connectors.kafka.internals.KafkaDeserializationSchemaWrapper.deserialize(KafkaDeserializationSchemaWrapper.java:56)
>>>>>> ~[?:?]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:181)
>>>>>> ~[?:?]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:141)
>>>>>> ~[?:?]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:755)
>>>>>> ~[?:?]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
>>>>>> ~[flink-dist_2.12-1.11.1.jar:1.11.1]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
>>>>>> ~[flink-dist_2.12-1.11.1.jar:1.11.1]
>>>>>> flink-jobmanager_1 | at
>>>>>> org.apache.flink.streaming.runtime.tasks.SourceStrea

Re: Does Flink support such a feature currently?

2020-09-22 Thread Marta Paes Moreira
Hi, Roc.

*Note:* in the future, please send this type of questions to the user
mailing list instead (user@flink.apache.org)!

If I understand your question correctly, this is possible using the LIKE
clause and a registered catalog. There is currently no implementation for
the MySQL JDBC catalog, but this is in the roadmap [1,2].

Once you register a catalog, you could do:

CREATE TABLE mapping_table

WITH (

  ...

 )

LIKE full_path_to_source_table;
Again, as of Flink 1.11 this only works for Postgres, not yet MySQL. I'm
copying in Bowen as he might be able to give more information on the
roadmap.

Marta

[1] https://issues.apache.org/jira/browse/FLINK-15352
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog

On Tue, Sep 22, 2020 at 9:44 AM Roc Marshal  wrote:

> Hi, everyone!
>
>When using flink sql DDL to create a mysql mapping table, does
> flink support the automatic rendering of the target table schema if we put
> no column-names in `create table table_name_mapping2mysql () with (...)`?
> If this feature is not supported, is it necessary to consider improving it?
>
> Thank you.
> Best, Roc.


Re: [DISCUSS] Create a Flink ecosystem website

2019-07-19 Thread Marta Paes Moreira
Hey, Robert.

I will keep an eye on the overall progress and get started on the blog post
to make the community announcement. Are there (mid-term) plans to
translate/localize this website as well? It might be a point worth
mentioning in the blogpost.

Hats off to you and Daryl — this turned out amazing!

Marta

On Thu, Jul 18, 2019 at 10:57 AM Congxian Qiu 
wrote:

> Robert and Daryl, thanks for the great work, I tried the website and filed
> some issues on Github.
> Best,
> Congxian
>
>
> Robert Metzger  于2019年7月17日周三 下午11:28写道:
>
>> Hey all,
>>
>> Daryl and I have great news to share. We are about to finish adding the
>> basic features to the ecosystem page.
>> We are at a stage where it is ready to be reviewed and made public.
>>
>> You can either check out a development instance of the ecosystem page
>> here: https://flink-ecosystem-demo.flink-resources.org/
>> Or you run it locally, with the instructions from the README.md:
>> https://github.com/sorahn/flink-ecosystem
>>
>> Please report all issues you find here:
>> https://github.com/sorahn/flink-ecosystem/issues or in this thread.
>>
>> The next steps in this project are the following:
>> - We fix all issues reported through this testing
>> - We set up the site on the INFRA resources Becket has secured [1], do
>> some further testing (including email notifications) and pre-fill the page
>> with some packages.
>> - We set up a packages.flink.apache.org or flink.apache.org/packages
>> domain
>> - We announce the packages through a short blog post
>>
>> Happy testing!
>>
>> Best,
>> Robert
>>
>> [1] https://issues.apache.org/jira/browse/INFRA-18010
>>
>>
>> On Thu, Apr 25, 2019 at 6:23 AM Becket Qin  wrote:
>>
>>> Thanks for the update, Robert. Looking forward to the website. If there
>>> is already a list of software we need to run the website, we can ask Apache
>>> infra team to prepare the VM for us, as that may also take some time.
>>>
>>> On Wed, Apr 24, 2019 at 11:57 PM Robert Metzger 
>>> wrote:
>>>
 Hey all,

 quick update on this project: The frontend and backend code have been
 put together into this repository:
 https://github.com/sorahn/flink-ecosystem
 We also just agreed on an API specification, and will now work on
 finishing the backend.

 It will probably take a few more weeks for this to finish, but we are
 making progress :)

 Best,
 Robert


 On Mon, Apr 15, 2019 at 11:18 AM Robert Metzger 
 wrote:

> Hey Daryl,
>
> thanks a lot for posting a link to this first prototype on the mailing
> list! I really like it!
>
> Becket: Our plan forward is that Congxian is implementing the backend
> for the website. He has already started with the work, but needs at least
> one more week.
>
>
> [Re-sending this email because the first one was blocked on dev@f.a.o]
>
>
> On Mon, Apr 15, 2019 at 7:59 AM Becket Qin 
> wrote:
>
>> Hi Daryl,
>>
>> Thanks a lot for the update. The site looks awesome! This is a great
>> progress. I really like the conciseness of GUI.
>>
>> One minor suggestion is that for the same library, there might be
>> multiple versions compatible with different Flink versions. It would be
>> good to show that somewhere in the project page as it seems important to
>> the users.
>>
>> BTW, will you share the plan to move forward? Would additional hands
>> help?
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Sat, Apr 13, 2019 at 7:10 PM Daryl Roberts 
>> wrote:
>>
>>> > Shall we add a guide page to show people how to publish their
>>> projects to the website? The exact rules can be discussed and drafted 
>>> in a
>>> separate email thread IMO
>>>
>>> This is a good idea. (Both the guise, and separate thread), I think
>>> once there is an actual packed in place we’ll be in a lot better 
>>> position
>>> to discuss this.
>>>
>>> > The "Log in with Github" link doesn't seem to work yet. Will it
>>> only allow login for admins and publishers, or for everyone?
>>>
>>> Correct, all the oauth stuff requires a real server. We are
>>> currently just faking everything.
>>>
>>> I will add a mock-login page (username/password that just accepts
>>> anything and displays whatever username you type in) so we can see the
>>> add-comment field and add-packages page once they exist.
>>>
>>>
>>>
>>>