date:20181127

Re: [FEEDBACK REQUEST] Re: [ANNOUNCEMENT] Nexmark included to the CI

2018-11-27 Thread Alex Amato

It would be great to add some lower level benchmark tests for the java SDK.
I was thinking of using open census for collecting benchmarks, which looks
easy to use should be license compatible. I'm just not sure about how to
export the results so that we can display them on the perfkit dashboard for
everyone to see.

Is there an example PR for this part? Can we write to this data store for
this perfkit dashboard easily?

https://github.com/census-instrumentation/opencensus-java
https://github.com/census-instrumentation/opencensus-java/tree/master/exporters/trace/zipkin#quickstart




On Thu, Jul 19, 2018 at 1:28 PM Andrew Pilloud  wrote:

> The doc changes look good to me, I'll add Dataflow once it is ready.
> Thanks for opening the issue on the DirectRunner. I'll try to get some
> progress on a dedicated perf node while you are gone, we can talk about
> increasing the size of the nexmark input collection for the runs once we
> know what the utilization on that looks like.
>
> Enjoy your time off!
>
>
> Andrew
>
> On Thu, Jul 19, 2018 at 9:00 AM Etienne Chauchot 
> wrote:
>
>> Hi guys,
>> As suggested by Anton bellow, I opened a PR on the website to reference
>> the Nexmark dashboards.
>> As I did not want users to take them for proper neutral benchmarks of the
>> runners / engines, but more for a CI piece of software, I added a
>> disclaimer.
>>
>> Please:
>> - tell if you agree on the publication of such performance results
>> - comment on the PR for the disclaimer.
>>
>> PR: https://github.com/apache/beam-site/pull/500
>>
>> Thanks
>>
>> Etienne
>>
>>
>> Le jeudi 19 juillet 2018 à 12:30 +0200, Etienne Chauchot a écrit :
>>
>> Hi Anton,
>>
>> Yes, good idea, I'll update nexmark website page
>>
>> Etienne
>>
>> Le mercredi 18 juillet 2018 à 10:17 -0700, Anton Kedin a écrit :
>>
>> These dashboards look great!
>>
>> Can publish the links to the dashboards somewhere, for better visibility?
>> E.g. in the jenkins website / emails, or the wiki.
>>
>> Regards,
>> Anton
>>
>> On Wed, Jul 18, 2018 at 10:08 AM Andrew Pilloud 
>> wrote:
>>
>> Hi Etienne,
>>
>> I've been asking around and it sounds like we should be able to get a
>> dedicated Jenkins node for performance tests. Another thing that might help
>> is making the runs a few times longer. They are currently running around 2
>> seconds each, so the total time of the build probably exceeds testing.
>> Internally at Google we are running them with 2000x as many events on
>> Dataflow, but a job of that size won't even complete on the Direct Runner.
>>
>> I didn't see the query 3 issues, but now that you point it out it looks
>> like a bug to me too.
>>
>> Andrew
>>
>> On Wed, Jul 18, 2018 at 1:13 AM Etienne Chauchot 
>> wrote:
>>
>> Hi Andrew,
>>
>> Yes I saw that, except dedicating jenkins nodes to nexmark, I see no
>> other way.
>>
>> Also, did you see query 3 output size on direct runner? Should be a
>> straight line and it is not, I'm wondering if there is a problem with sate
>> and timers impl in direct runner.
>>
>> Etienne
>>
>> Le mardi 17 juillet 2018 à 11:38 -0700, Andrew Pilloud a écrit :
>>
>> I'm noticing the graphs are really noisy. It looks like we are running
>> these on shared Jenkins executors, so our perf tests are fighting with
>> other builds for CPU. I've opened an issue
>> https://issues.apache.org/jira/browse/BEAM-4804 and am wondering if
>> anyone knows an easy fix to isolate these jobs.
>>
>> Andrew
>>
>> On Fri, Jul 13, 2018 at 2:39 AM Łukasz Gajowy  wrote:
>>
>> @Etienne: Nice to see the graphs! :)
>>
>> @Ismael: Good idea, there's no document yet. I think we could create a
>> small google doc with instructions on how to do this.
>>
>> pt., 13 lip 2018 o 10:46 Etienne Chauchot 
>> napisał(a):
>>
>> Hi,
>>
>> @Andrew, this is because I did not find a way to set 2 scales on the Y
>> axis on the perfkit graphs. Indeed numResults varies from 1 to 100 000 and
>> runtimeSec is usually bellow 10s.
>>
>> Etienne
>>
>> Le jeudi 12 juillet 2018 à 12:04 -0700, Andrew Pilloud a écrit :
>>
>> This is great, should make performance work much easier! I'm going to get
>> the Beam SQL Nexmark jobs publishing as well. (Opened
>> https://issues.apache.org/jira/browse/BEAM-4774 to track.) I might take
>> on the Dataflow runner as well if no one else volunteers.
>>
>> I am curious as to why you have two separate graphs for runtime and count
>> rather then graphing runtime/count to get the throughput rate for each run?
>> Or should that be a third graph? Looks like it would just be a small tweak
>> to the query in perfkit.
>>
>>
>>
>> Andrew
>>
>> On Thu, Jul 12, 2018 at 11:40 AM Pablo Estrada 
>> wrote:
>>
>> This is really cool Etienne : ) thanks for working on this.
>> Our of curiosity, do you know how often the tests run on each runner?
>>
>> Best
>> -P.
>>
>> On Thu, Jul 12, 2018 at 2:15 AM Romain Manni-Bucau 
>> wrote:
>>
>> Awesome Etienne, this is really important for the (user) community to
>> have that visibility since it is one of the most

Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

2018-11-27 Thread Lukasz Cwik

I updated the PR addressing the last of Scott's comments and also migrated
to use an integral fraction as Robert had recommended by using approach A
for the proto representation and BigDecimal within the Java SDK:
A:
// Represents a non-negative decimal number: unscaled_value * 10^(-scale)
message Decimal {
  // Represents the unscaled value as a big endian unlimited precision
non-negative integer.
  bytes unscaled_value = 1;
  // Represents the scale
  uint32 scale = 2;
}

Ismael, I would like to defer the changes to improve the ByteBuddy
DoFnInvoker since that is parallelizable work and have filed BEAM-6142.

I don't believe there are any other outstanding changes and would like to
get the PR merged so that people can start working on implementing support
for backlog reporting and splitting within the Java SDK harness, improving
the ByteBuddy DoFnInvoker, exposing the shared runner library parts, and
integrating this into ULR, Flink, Dataflow, ...

On Mon, Nov 26, 2018 at 9:49 AM Lukasz Cwik  wrote:

>
>
> On Mon, Nov 26, 2018 at 9:09 AM Ismaël Mejía  wrote:
>
>> > Bundle finalization is unrelated to backlogs but is needed since there
>> is a class of data stores which need acknowledgement that says I have
>> successfully received your data and am now responsible for it such as
>> acking a message from a message queue.
>>
>> Currently ack is done by IOs as part of checkpointing. How this will
>> be different? Can you please clarify how should be done in this case,
>> or is this totally independent?
>>
>
> The flow for finalization and checkpointing is similar:
> Checkpointing:
> 1) Process a bundle
> 2) Checkpoint bundle containing acks that need to be done
> 3) When checkpoint resumes, acknowledge messages
>
> Finalization:
> 1) Process a bundle
> 2) Request bundle finalization when bundle completes
> 3) SDK is asked to finalize bundle
>
> The difference between the two is that bundle finalization always goes
> back to the same machine instance that processed the bundle while
> checkpointing can be scheduled on another machine. Many message queue like
> systems expose clients which store in memory state and can't ack from
> another machine. You could solve the problem with checkpointing but would
> require each machine to be able to tell another machine that it got a
> checkpoint with acks that it is responsible for but this won't work
> everywhere and isn't as clean.
>
>
>> > UnboundedPerElement/BoundedPerElement tells us during pipeline
>> construction time what type of PCollection we will be creating since we may
>> have a bounded PCollection goto an UnboundedPerElement DoFn and that will
>> produce an unbounded PCollection and similarly we could have an unbounded
>> PCollection goto a BoundedPerElement DoFn and that will produce an
>> unbounded PCollection. Restrictions.IsBounded is used during pipeline
>> execution to inform the runner whether a restriction being returned is
>> bounded or not since unbounded restrictions can return bounded restrictions
>> during splitting. So in the above example using the message queue, the
>> first 7 restrictions that only read 1250 messages would be marked with the
>> Restrictions.IsBounded interface while the last one would not be. This
>> could also be a method on restrictions such as "IsBounded isBounded()" on
>> Pcollections.
>>
>> Thanks for the explanation about Restrictions.IsBounded, since this is
>> information for the runner What is the runner expected to do
>> differently when IsUnbounded? (I assume that IsBounded is the default
>> behavior and nothing changes).
>>
>
> Knowing whether a restriction is bounded or unbounded is important, one
> example use case would be for the limited depth splitting proposal (
> https://docs.google.com/document/d/1cKOB9ToasfYs1kLWQgffzvIbJx2Smy4svlodPRhFrk4/edit#heading=h.wkwslng744mv)
> since you want to keep the unbounded restrictions at level 0 and only pass
> the bounded restrictions to the other levels. The reasoning behind this is
> that you don't want to end up in a state where all your unbounded
> restrictions are at the highest level preventing you from splitting any
> further.
>
>
>> > Note that this does bring up the question of whether SDKs should expose
>> coders for backlogs since ByteKeyCoder and BigEndianLongCoder exist which
>> would cover a good number of scenarios described above. This coder doesn't
>> have to be understood by the runner nor does it have to be part of the
>> portability APIs (either Runner of Fn API). WDYT?
>>
>> Yes we may need a Coder effectively for both sides, only thing I don’t
>> like is external impact in the API. I mean it is not too complex, but
>> adds some extras to support things that are ‘rarely’ changed.
>>
>
> Based upon Robert's suggestion above to swap to use a integral floating
> point number and even without Robert's suggestion this won't work. The idea
> was that a coder would help convert the byte[] backlog representation
> to/from a type the user wants but the

Re: Confluence edit access

2018-11-27 Thread Lukasz Cwik

I have granted you access.

On Tue, Nov 27, 2018 at 2:39 PM Huygaa Batsaikhan  wrote:

> Hi devs, can anyone grant me edit access? Would like to update
> https://cwiki.apache.org/confluence/display/BEAM/Usage+Guide.
>
> Thanks.
>

Re: TextIO setting file dynamically issue

2018-11-27 Thread Lukasz Cwik

+u...@beam.apache.org 

On Mon, Nov 26, 2018 at 5:33 PM Reuven Lax  wrote:

> Do you need it to change based on the timestamps of the records being
> processed, or based on actual current time?
>
> On Mon, Nov 26, 2018 at 5:30 PM Matthew Schneid <
> matthew.t.schn...@gmail.com> wrote:
>
>> Hello,
>>
>>
>>
>> I have an interesting issue that I can’t seem to find a reliable
>> resolution too.
>>
>>
>>
>> I have a standard TextIO output that looks like the following:
>>
>>
>>
>> TextIO.*write*().to("gs://+ new DateTime().toString("HH-mm-ss") 
>> + "/Test-")
>>
>>
>>
>> The above works, and writes to GSC, as I expect it too.
>>
>>
>>
>> However, it retains the instantiated datetime value, and what I need to
>> happen is for it to dynamically change with the current time.
>>
>>
>>
>> Is this possible?
>>
>>
>>
>> Thanks for any and all help that can be provided.
>>
>>
>>
>> V/R
>>
>>
>>
>> MS
>>
>

Re: Build fail on Python SDK

2018-11-27 Thread Jean-Baptiste Onofré

Thanks,

let me run a new build with -scan option.

I keep you posted.

Regards
JB

On 27/11/2018 18:46, Ahmet Altay wrote:
> I have not seen this one before. Could you share the link to gradle
> build scan, perhaps that will have more information?
> 
> On Tue, Nov 27, 2018 at 6:42 AM, Jean-Baptiste Onofré  > wrote:
> 
> Hi guys
> 
> I have some tests failures on Python SDK (19 failures exactly).
> 
> They all look the same, for instance:
> 
> FAIL: testIndexing
> (apache_beam.typehints.trivial_inference_test.TrivialInferenceTest)
> --
> Traceback (most recent call last):
>   File
> 
> "/home/jbonofre/Workspace/beam/sdks/python/apache_beam/typehints/trivial_inference_test.py",
> line 41, in testIndexing
>     self.assertReturnType(int, lambda x: x[0], [typehints.Tuple[int,
> str]])
>   File
> 
> "/home/jbonofre/Workspace/beam/sdks/python/apache_beam/typehints/trivial_inference_test.py",
> line 35, in assertReturnType
>     self.assertEquals(expected, trivial_inference.infer_return_type(f,
> inputs))
> AssertionError:  != Any
> 
> 
> Does someone else have the same issue ?
> 
> Thanks,
> Regards
> JB
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey

2018-11-27 Thread Kenneth Knowles

I actually didn't look at this one. I filed a bunch more adjacent flake
bugs. I didn't find your bug but I do see that test flaking at the same
time as the others. FWIW here is the list of flakes and sickbayed tests:
https://issues.apache.org/jira/issues/?filter=12343195

Kenn

On Tue, Nov 27, 2018 at 10:25 AM Alex Amato  wrote:

> +Ken,
>
> Did you happen to look into this test? I heard that you may have been
> looking into this.
>
> On Mon, Nov 26, 2018 at 3:36 PM Maximilian Michels  wrote:
>
>> Hi Alex,
>>
>> Thanks for your help! I'm quite used to debugging concurrent/distributed
>> problems. But this one is quite tricky, especially with regards to GRPC
>> threads. I try to provide more information in the following.
>>
>> There are two observations:
>>
>> 1) The problem is specifically related to how the cleanup is performed
>> for the EmbeddedEnvironmentFactory. The environment is shutdown when the
>> SDK Harness exists but the GRPC threads continue to linger for some time
>> and may stall state processing on the next test.
>>
>> If you do _not_ close DefaultJobBundleFactory, which happens during
>> close() or dispose() in the FlinkExecutableStageFunction or
>> ExecutableStageDoFnOperator respectively, the tests run just fine. I ran
>> 1000 test runs without a single failure.
>>
>> The EmbeddedEnvironment uses direct channels which are marked
>> experimental in GRPC. We may have to convert them to regular socket
>> communication.
>>
>> 2) Try setting a conditional breakpoint in GrpcStateService which will
>> never break, e.g. "false". Set it here:
>>
>> https://github.com/apache/beam/blob/6da9aa5594f96c0201d497f6dce4797c4984a2fd/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java#L134
>>
>> The tests will never fail. The SDK harness is always shutdown correctly
>> at the end of the test.
>>
>> Thanks,
>> Max
>>
>> On 26.11.18 19:15, Alex Amato wrote:
>> > Thanks Maximilian, let me know if you need any help. Usually I debug
>> > this sort of thing by pausing the IntelliJ debugger to see all the
>> > different threads which are waiting on various conditions. If you find
>> > any insights from that, please post them here and we can try to figure
>> > out the source of the stuckness. Perhaps it may be some concurrency
>> > issue leading to deadlock?
>> >
>> > On Thu, Nov 22, 2018 at 12:57 PM Maximilian Michels > > > wrote:
>> >
>> > I couldn't fix it thus far. The issue does not seem to be in the
>> Flink
>> > Runner but in the way the tests utilizes the EMBEDDED environment to
>> > run
>> > multiple portable jobs in a row.
>> >
>> > When it gets stuck it is in RemoteBundle#close and it is
>> independent of
>> > the test type (batch and streaming have different implementations).
>> >
>> > Will give it another look tomorrow.
>> >
>> > Thanks,
>> > Max
>> >
>> > On 22.11.18 13:07, Maximilian Michels wrote:
>> >  > Hi Alex,
>> >  >
>> >  > The test seems to have gotten flaky after we merged support for
>> > portable
>> >  > timers in Flink's batch mode.
>> >  >
>> >  > Looking into this now.
>> >  >
>> >  > Thanks,
>> >  > Max
>> >  >
>> >  > On 21.11.18 23:56, Alex Amato wrote:
>> >  >> Hello, I have noticed
>> >  >> that org.apache.beam.runners.flink.PortableTimersExecutionTest
>> > is very
>> >  >> flakey, and repro'd this test timeout on the master branch in
>> > 40/50 runs.
>> >  >>
>> >  >> I filed a JIRA issue: BEAM-6111
>> >  >> . I was just
>> >  >> wondering if anyone knew why this may be occurring, and to
>> check if
>> >  >> anyone else has been experiencing this.
>> >  >>
>> >  >> Thanks,
>> >  >> Alex
>> >
>>
>

Re: org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey

2018-11-27 Thread Alex Amato

+Ken,

Did you happen to look into this test? I heard that you may have been
looking into this.

On Mon, Nov 26, 2018 at 3:36 PM Maximilian Michels  wrote:

> Hi Alex,
>
> Thanks for your help! I'm quite used to debugging concurrent/distributed
> problems. But this one is quite tricky, especially with regards to GRPC
> threads. I try to provide more information in the following.
>
> There are two observations:
>
> 1) The problem is specifically related to how the cleanup is performed
> for the EmbeddedEnvironmentFactory. The environment is shutdown when the
> SDK Harness exists but the GRPC threads continue to linger for some time
> and may stall state processing on the next test.
>
> If you do _not_ close DefaultJobBundleFactory, which happens during
> close() or dispose() in the FlinkExecutableStageFunction or
> ExecutableStageDoFnOperator respectively, the tests run just fine. I ran
> 1000 test runs without a single failure.
>
> The EmbeddedEnvironment uses direct channels which are marked
> experimental in GRPC. We may have to convert them to regular socket
> communication.
>
> 2) Try setting a conditional breakpoint in GrpcStateService which will
> never break, e.g. "false". Set it here:
>
> https://github.com/apache/beam/blob/6da9aa5594f96c0201d497f6dce4797c4984a2fd/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java#L134
>
> The tests will never fail. The SDK harness is always shutdown correctly
> at the end of the test.
>
> Thanks,
> Max
>
> On 26.11.18 19:15, Alex Amato wrote:
> > Thanks Maximilian, let me know if you need any help. Usually I debug
> > this sort of thing by pausing the IntelliJ debugger to see all the
> > different threads which are waiting on various conditions. If you find
> > any insights from that, please post them here and we can try to figure
> > out the source of the stuckness. Perhaps it may be some concurrency
> > issue leading to deadlock?
> >
> > On Thu, Nov 22, 2018 at 12:57 PM Maximilian Michels  > > wrote:
> >
> > I couldn't fix it thus far. The issue does not seem to be in the
> Flink
> > Runner but in the way the tests utilizes the EMBEDDED environment to
> > run
> > multiple portable jobs in a row.
> >
> > When it gets stuck it is in RemoteBundle#close and it is independent
> of
> > the test type (batch and streaming have different implementations).
> >
> > Will give it another look tomorrow.
> >
> > Thanks,
> > Max
> >
> > On 22.11.18 13:07, Maximilian Michels wrote:
> >  > Hi Alex,
> >  >
> >  > The test seems to have gotten flaky after we merged support for
> > portable
> >  > timers in Flink's batch mode.
> >  >
> >  > Looking into this now.
> >  >
> >  > Thanks,
> >  > Max
> >  >
> >  > On 21.11.18 23:56, Alex Amato wrote:
> >  >> Hello, I have noticed
> >  >> that org.apache.beam.runners.flink.PortableTimersExecutionTest
> > is very
> >  >> flakey, and repro'd this test timeout on the master branch in
> > 40/50 runs.
> >  >>
> >  >> I filed a JIRA issue: BEAM-6111
> >  >> . I was just
> >  >> wondering if anyone knew why this may be occurring, and to check
> if
> >  >> anyone else has been experiencing this.
> >  >>
> >  >> Thanks,
> >  >> Alex
> >
>

Re: Build fail on Python SDK

2018-11-27 Thread Ahmet Altay

I have not seen this one before. Could you share the link to gradle build
scan, perhaps that will have more information?

On Tue, Nov 27, 2018 at 6:42 AM, Jean-Baptiste Onofré 
wrote:

> Hi guys
>
> I have some tests failures on Python SDK (19 failures exactly).
>
> They all look the same, for instance:
>
> FAIL: testIndexing
> (apache_beam.typehints.trivial_inference_test.TrivialInferenceTest)
> --
> Traceback (most recent call last):
>   File
> "/home/jbonofre/Workspace/beam/sdks/python/apache_beam/
> typehints/trivial_inference_test.py",
> line 41, in testIndexing
> self.assertReturnType(int, lambda x: x[0], [typehints.Tuple[int, str]])
>   File
> "/home/jbonofre/Workspace/beam/sdks/python/apache_beam/
> typehints/trivial_inference_test.py",
> line 35, in assertReturnType
> self.assertEquals(expected, trivial_inference.infer_return_type(f,
> inputs))
> AssertionError:  != Any
>
>
> Does someone else have the same issue ?
>
> Thanks,
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: MongoDbIO

2018-11-27 Thread Jean-Baptiste Onofré

Hi Chaim,

do you mean that MongoDbIO could provided a PCollection
containing the payload + id + time ?

Regards
JB

On 27/11/2018 12:47, Chaim Turkel wrote:
> Hi,
>   I would like to write a sync to validate that i have all records
> from mongo in my bigquery.
> to do this i would like to bring the fields id,time from mongo to
> biguqery, and only on the missing docuements to read the full
> document,
> I did not see a way to bring a paritial document?
> 
> chaim
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: contributor in the Beam

2018-11-27 Thread Jean-Baptiste Onofré

Hi Chaim,

The best is to create a Jira describing the new features you want to
add. Then, you can create a PR related to this Jira.

As I'm the original MongoDbIO author, I would be more than happy to help
you and review the PR.

Thanks !
Regards
JB

On 27/11/2018 15:37, Chaim Turkel wrote:
> Hi,
>   I have added a few features to the MongoDbIO and would like to add
> them to the project.
> I have read https://beam.apache.org/contribute/
> I have added a jira user, what do i need to do next?
> 
> chaim
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Build fail on Python SDK

2018-11-27 Thread Jean-Baptiste Onofré

Hi guys

I have some tests failures on Python SDK (19 failures exactly).

They all look the same, for instance:

FAIL: testIndexing
(apache_beam.typehints.trivial_inference_test.TrivialInferenceTest)
--
Traceback (most recent call last):
  File
"/home/jbonofre/Workspace/beam/sdks/python/apache_beam/typehints/trivial_inference_test.py",
line 41, in testIndexing
self.assertReturnType(int, lambda x: x[0], [typehints.Tuple[int, str]])
  File
"/home/jbonofre/Workspace/beam/sdks/python/apache_beam/typehints/trivial_inference_test.py",
line 35, in assertReturnType
self.assertEquals(expected, trivial_inference.infer_return_type(f,
inputs))
AssertionError:  != Any


Does someone else have the same issue ?

Thanks,
Regards
JB
-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

contributor in the Beam

2018-11-27 Thread Chaim Turkel

Hi,
  I have added a few features to the MongoDbIO and would like to add
them to the project.
I have read https://beam.apache.org/contribute/
I have added a jira user, what do i need to do next?

chaim

-- 


Loans are funded by
FinWise Bank, a Utah-chartered bank located in Sandy, 
Utah, member FDIC, Equal
Opportunity Lender. Merchant Cash Advances are 
made by Behalf. For more
information on ECOA, click here 
. For important information about 
opening a new
account, review Patriot Act procedures here 
.
Visit Legal 
 to
review our comprehensive program terms, 
conditions, and disclosures.

MongoDbIO

2018-11-27 Thread Chaim Turkel

Hi,
  I would like to write a sync to validate that i have all records
from mongo in my bigquery.
to do this i would like to bring the fields id,time from mongo to
biguqery, and only on the missing docuements to read the full
document,
I did not see a way to bring a paritial document?

chaim

-- 


Loans are funded by
FinWise Bank, a Utah-chartered bank located in Sandy, 
Utah, member FDIC, Equal
Opportunity Lender. Merchant Cash Advances are 
made by Behalf. For more
information on ECOA, click here 
. For important information about 
opening a new
account, review Patriot Act procedures here 
.
Visit Legal 
 to
review our comprehensive program terms, 
conditions, and disclosures.

Re: [FEEDBACK REQUEST] Re: [ANNOUNCEMENT] Nexmark included to the CI

Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

Re: Confluence edit access

Re: TextIO setting file dynamically issue

Re: Build fail on Python SDK

Re: org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey

Re: org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey

Re: Build fail on Python SDK

Re: MongoDbIO

Re: contributor in the Beam

Build fail on Python SDK

contributor in the Beam

MongoDbIO

13 matches

Site Navigation

Mail list logo

Footer information