Re: [sqlalchemy] Flask SQlAlchemy BaseQuery Paginate not working correctly
Hey Simon, I am indeed getting back duplicates of aggregate ids. I reworked my query to using exists like you suggested and now I get back correct result count. Thanks so much for your suggestion and saving my day! I can now move ahead on my work ticket. Many, many, thanks again, appreciate it!! Prerna On Saturday, August 15, 2020 at 3:22:02 PM UTC-4, Simon King wrote: > > This is the same problem: you're writing a query that joins 3 tables > together, and then applying a "LIMIT 20" to that query. If you look > carefully at your 20 rows of psql output, I expect you'll see the same > aggregates_id appear more than once. There are less than 20 distinct > Aggregate objects. When SQLAlchemy receives those rows, it skips the > duplicates. > > The SQL looks something like this: > > SELECT ... FROM aggregates > LEFT JOIN aggregate_blocks ON ... > LEFT JOIN blocks ON ... > LIMIT 20 > > You need the LIMIT to apply just to the aggregates table, rather than > the joined tables. This is a bit tricky because you want to filter by > a field in one of those joined tables. You could use an EXISTS > function, something like this: > > SELECT * FROM aggregates > WHERE EXISTS ( > SELECT 1 > FROM aggregate_blocks > INNER JOIN blocks ON aggregate_blocks.block_id = block.id > WHERE aggregate_blocks.aggregate_id = aggregates.id > AND blocks.is_complete = false > ) > LIMIT 20 > > If you run that in psql, I think you should get 20 *different* > aggregates rows back. > > If that works, then to turn it into SQLAlchemy ORM syntax, you should > use Query.exists(): > > > https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.exists > > > Hope that helps, > > Simon > > On Fri, Aug 14, 2020 at 3:56 PM Prerna Pandit > wrote: > > > > Hey Simon, > > > > Thanks so much for replying to my question. I reworked my code to use > sqlalchemy ORM and took off flask and paginate so I can narrow down the > issue. My models now extend from declarative_base. > > > > engine = > create_engine('postgresql://postgres:postgres@localhost:5400/postgres') > > Session = sessionmaker(bind=engine) > > > > Base = declarative_base() > > session = Session() > > > > Models > > > > class Aggregate(Base): > > __tablename__ = 'aggregates' > > id = Column(UUID(as_uuid=True), primary_key=True, > >server_default=text('uuid_generate_v4()')) > > site_id = Column(UUID(as_uuid=True), nullable=True) > > created_at = Column(DateTime, default=sa.func.now()) > > created_by = Column(UUID(as_uuid=True), nullable=True) > > updated_at = Column(DateTime, default=sa.func.now(), > onupdate=sa.func.now()) > > updated_by = Column(UUID(as_uuid=True), nullable=True) > > blocks = relationship('AggregateBlock', cascade='all, > delete-orphan', > > passive_deletes=True, > back_populates='aggregate') > > > > > > class Block(Base): > > > > __tablename__ = 'blocks' > > id = Column(UUID(as_uuid=True), primary_key=True, > >server_default=text('uuid_generate_v4()')) > > type = Column(Text, nullable=False) > > heading = Column(Text, nullable=True) > > subheading = Column(Text, nullable=True) > > label = Column(Text, nullable=True) > > is_complete = Column(Boolean, default=False) > > created_at = Column(DateTime, default=sa.func.now()) > > created_by = Column(UUID(as_uuid=True), nullable=True) > > updated_at = Column(DateTime, default=sa.func.now(), > onupdate=sa.func.now()) > > updated_by = Column(UUID(as_uuid=True), nullable=True) > > aggregates = relationship('AggregateBlock', cascade='all, > delete-orphan', > > passive_deletes=True, > back_populates='block') > > > > > > class AggregateBlock(Base): > > __tablename__ = 'aggregate_blocks' > > id = Column(UUID(as_uuid=True), primary_key=True, > >server_default=text('uuid_generate_v4()')) > > block_id = Column(UUID(as_uuid=True), > > ForeignKey('blocks.id', ondelete='CASCADE'), > nullable=False, index=True) > > aggregate_id = Column(UUID(as_uuid=True), > > ForeignKey('aggregates.id', > ondelete='RESTRICT'), nullable=False) > > position = Column(Integer, nullable=False) > > block = relationship('Block', back_populates='aggregates') > > aggregate = relationship('Aggregate', back_populates='blocks') > > > > > > Query: > > > > select = session.query(Aggregate).order_by(Aggregate.created_at) \ > > .join(AggregateBlock) \ > > .join(Block) \ > > .filter(Block.is_complete == complete) \ > > > > all_results = select.all() > > limit_results = select.limit(20).all() > > > > > > I still get inconsistent results when I apply limit. Like select.all() > will return 47 rows but with limit it'll return
Re: [sqlalchemy] Flask SQlAlchemy BaseQuery Paginate not working correctly
This is the same problem: you're writing a query that joins 3 tables together, and then applying a "LIMIT 20" to that query. If you look carefully at your 20 rows of psql output, I expect you'll see the same aggregates_id appear more than once. There are less than 20 distinct Aggregate objects. When SQLAlchemy receives those rows, it skips the duplicates. The SQL looks something like this: SELECT ... FROM aggregates LEFT JOIN aggregate_blocks ON ... LEFT JOIN blocks ON ... LIMIT 20 You need the LIMIT to apply just to the aggregates table, rather than the joined tables. This is a bit tricky because you want to filter by a field in one of those joined tables. You could use an EXISTS function, something like this: SELECT * FROM aggregates WHERE EXISTS ( SELECT 1 FROM aggregate_blocks INNER JOIN blocks ON aggregate_blocks.block_id = block.id WHERE aggregate_blocks.aggregate_id = aggregates.id AND blocks.is_complete = false ) LIMIT 20 If you run that in psql, I think you should get 20 *different* aggregates rows back. If that works, then to turn it into SQLAlchemy ORM syntax, you should use Query.exists(): https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.exists Hope that helps, Simon On Fri, Aug 14, 2020 at 3:56 PM Prerna Pandit wrote: > > Hey Simon, > > Thanks so much for replying to my question. I reworked my code to use > sqlalchemy ORM and took off flask and paginate so I can narrow down the > issue. My models now extend from declarative_base. > > engine = > create_engine('postgresql://postgres:postgres@localhost:5400/postgres') > Session = sessionmaker(bind=engine) > > Base = declarative_base() > session = Session() > > Models > > class Aggregate(Base): > __tablename__ = 'aggregates' > id = Column(UUID(as_uuid=True), primary_key=True, >server_default=text('uuid_generate_v4()')) > site_id = Column(UUID(as_uuid=True), nullable=True) > created_at = Column(DateTime, default=sa.func.now()) > created_by = Column(UUID(as_uuid=True), nullable=True) > updated_at = Column(DateTime, default=sa.func.now(), > onupdate=sa.func.now()) > updated_by = Column(UUID(as_uuid=True), nullable=True) > blocks = relationship('AggregateBlock', cascade='all, delete-orphan', > passive_deletes=True, > back_populates='aggregate') > > > class Block(Base): > > __tablename__ = 'blocks' > id = Column(UUID(as_uuid=True), primary_key=True, >server_default=text('uuid_generate_v4()')) > type = Column(Text, nullable=False) > heading = Column(Text, nullable=True) > subheading = Column(Text, nullable=True) > label = Column(Text, nullable=True) > is_complete = Column(Boolean, default=False) > created_at = Column(DateTime, default=sa.func.now()) > created_by = Column(UUID(as_uuid=True), nullable=True) > updated_at = Column(DateTime, default=sa.func.now(), > onupdate=sa.func.now()) > updated_by = Column(UUID(as_uuid=True), nullable=True) > aggregates = relationship('AggregateBlock', cascade='all, delete-orphan', > passive_deletes=True, back_populates='block') > > > class AggregateBlock(Base): > __tablename__ = 'aggregate_blocks' > id = Column(UUID(as_uuid=True), primary_key=True, >server_default=text('uuid_generate_v4()')) > block_id = Column(UUID(as_uuid=True), > ForeignKey('blocks.id', ondelete='CASCADE'), > nullable=False, index=True) > aggregate_id = Column(UUID(as_uuid=True), > ForeignKey('aggregates.id', > ondelete='RESTRICT'), nullable=False) > position = Column(Integer, nullable=False) > block = relationship('Block', back_populates='aggregates') > aggregate = relationship('Aggregate', back_populates='blocks') > > > Query: > > select = session.query(Aggregate).order_by(Aggregate.created_at) \ > .join(AggregateBlock) \ > .join(Block) \ > .filter(Block.is_complete == complete) \ > > all_results = select.all() > limit_results = select.limit(20).all() > > > I still get inconsistent results when I apply limit. Like select.all() will > return 47 rows but with limit it'll return anywhere between 11 to 15. If I > take the generated SQL query and run it directly in psql, I get the correct > count. > > SELECT aggregates.id AS aggregates_id, aggregates.site_id AS > aggregates_site_id, aggregates.created_at AS aggregates_created_at, > aggregates.created_by AS aggregates_created_by, aggregates.updated_at AS > aggregates_updated_at, aggregates.updated_by AS aggregates_updated_by > FROM aggregates JOIN aggregate_blocks ON aggregates.id = > aggregate_blocks.aggregate_id JOIN blocks ON blocks.id = > aggregate_blocks.block_id > WHERE blocks.is_complete = false ORDER BY aggregates.created_at > > LIMIT 20 > > > > > > On Friday, August 14, 2020 at 10:08:23 AM UTC-4, Simon King
Re: [sqlalchemy] Flask SQlAlchemy BaseQuery Paginate not working correctly
Hey Simon, Thanks so much for replying to my question. I reworked my code to use sqlalchemy ORM and took off flask and paginate so I can narrow down the issue. My models now extend from declarative_base. engine = create_engine('postgresql://postgres:postgres@localhost:5400/postgres') Session = sessionmaker(bind=engine) Base = declarative_base() session = Session() Models class Aggregate(Base): __tablename__ = 'aggregates' id = Column(UUID(as_uuid=True), primary_key=True, server_default=text('uuid_generate_v4()')) site_id = Column(UUID(as_uuid=True), nullable=True) created_at = Column(DateTime, default=sa.func.now()) created_by = Column(UUID(as_uuid=True), nullable=True) updated_at = Column(DateTime, default=sa.func.now(), onupdate=sa.func.now()) updated_by = Column(UUID(as_uuid=True), nullable=True) blocks = relationship('AggregateBlock', cascade='all, delete-orphan', passive_deletes=True, back_populates='aggregate') class Block(Base): __tablename__ = 'blocks' id = Column(UUID(as_uuid=True), primary_key=True, server_default=text('uuid_generate_v4()')) type = Column(Text, nullable=False) heading = Column(Text, nullable=True) subheading = Column(Text, nullable=True) label = Column(Text, nullable=True) is_complete = Column(Boolean, default=False) created_at = Column(DateTime, default=sa.func.now()) created_by = Column(UUID(as_uuid=True), nullable=True) updated_at = Column(DateTime, default=sa.func.now(), onupdate=sa.func.now()) updated_by = Column(UUID(as_uuid=True), nullable=True) aggregates = relationship('AggregateBlock', cascade='all, delete-orphan', passive_deletes=True, back_populates='block') class AggregateBlock(Base): __tablename__ = 'aggregate_blocks' id = Column(UUID(as_uuid=True), primary_key=True, server_default=text('uuid_generate_v4()')) block_id = Column(UUID(as_uuid=True), ForeignKey('blocks.id', ondelete='CASCADE'), nullable=False, index=True) aggregate_id = Column(UUID(as_uuid=True), ForeignKey('aggregates.id', ondelete='RESTRICT'), nullable=False) position = Column(Integer, nullable=False) block = relationship('Block', back_populates='aggregates') aggregate = relationship('Aggregate', back_populates='blocks') Query: select = session.query(Aggregate).order_by(Aggregate.created_at) \ .join(AggregateBlock) \ .join(Block) \ .filter(Block.is_complete == complete) \ all_results = select.all() limit_results = select.limit(20).all() I still get inconsistent results when I apply limit. Like select.all() will return 47 rows but with limit it'll return anywhere between 11 to 15. If I take the generated SQL query and run it directly in psql, I get the correct count. SELECT aggregates.id AS aggregates_id, aggregates.site_id AS aggregates_site_id, aggregates.created_at AS aggregates_created_at, aggregates.created_by AS aggregates_created_by, aggregates.updated_at AS aggregates_updated_at, aggregates.updated_by AS aggregates_updated_by FROM aggregates JOIN aggregate_blocks ON aggregates.id = aggregate_blocks.aggregate_id JOIN blocks ON blocks.id = aggregate_blocks.block_id WHERE blocks.is_complete = false ORDER BY aggregates.created_at LIMIT 20 On Friday, August 14, 2020 at 10:08:23 AM UTC-4, Simon King wrote: > > "paginate" is not an SQLAlchemy function, so you'd be better off > asking the author of whatever is providing that feature. > > However, I would guess that maybe paginate is naively applying > something like "LIMIT 20" to the query. This doesn't work properly > when you join along a one-to-many relationship, because if you have > (for example) 2 "parent" objects, each with 5 "child" objects, the > query will return 10 rows, but SQLAlchemy de-duplicates the results to > return just the 2 parent objects. > > Simon > > On Thu, Aug 13, 2020 at 3:31 PM Prerna Pandit > wrote: > > > > Hello, I've been struggling with this issue for the past couple of days > and would really, truly appreciate if someone could please give me > pointers or direction as to what I might be missing. > > > > > > Here are my models; > > class Aggregate(db.Model): > > id = db.Column(UUID(as_uuid=True), primary_key=True, > >server_default=db.text('uuid_generate_v4()')) > > blocks = db.relationship('AggregateBlock', cascade='all, > delete-orphan', > > passive_deletes=True, > back_populates='aggregate').. > > > > > > class AggregateBlock(db.Model): > > > > id = db.Column(UUID(as_uuid=True), primary_key=True, > >server_default=db.text('uuid_generate_v4()')) > > block_id = db.Column(UUID(as_uuid=True), > >
Re: [sqlalchemy] Flask SQlAlchemy BaseQuery Paginate not working correctly
"paginate" is not an SQLAlchemy function, so you'd be better off asking the author of whatever is providing that feature. However, I would guess that maybe paginate is naively applying something like "LIMIT 20" to the query. This doesn't work properly when you join along a one-to-many relationship, because if you have (for example) 2 "parent" objects, each with 5 "child" objects, the query will return 10 rows, but SQLAlchemy de-duplicates the results to return just the 2 parent objects. Simon On Thu, Aug 13, 2020 at 3:31 PM Prerna Pandit wrote: > > Hello, I've been struggling with this issue for the past couple of days and > would really, truly appreciate if someone could please give me pointers or > direction as to what I might be missing. > > > Here are my models; > class Aggregate(db.Model): > id = db.Column(UUID(as_uuid=True), primary_key=True, >server_default=db.text('uuid_generate_v4()')) > blocks = db.relationship('AggregateBlock', cascade='all, delete-orphan', > passive_deletes=True, > back_populates='aggregate').. > > > class AggregateBlock(db.Model): > > id = db.Column(UUID(as_uuid=True), primary_key=True, >server_default=db.text('uuid_generate_v4()')) > block_id = db.Column(UUID(as_uuid=True), > db.ForeignKey('blocks.id', ondelete='CASCADE'), > nullable=False, index=True) > aggregate_id = db.Column(UUID(as_uuid=True), > db.ForeignKey('aggregates.id', > ondelete='RESTRICT'), nullable=False) > block = db.relationship('Block', back_populates='aggregates') > aggregate = db.relationship('Aggregate', back_populates='blocks') > > > > > class Block(db.Model): > id = db.Column(UUID(as_uuid=True), primary_key=True, >server_default=db.text('uuid_generate_v4()')) > is_complete = db.Column(db.Boolean, default=False) > aggregates = db.relationship('AggregateBlock', cascade='all, > delete-orphan', > passive_deletes=True, back_populates='block') > > > from flask_sqlalchemy import SQLAlchemy > db = SQLAlchemy() > > select = > db.session.query(Aggregate).join(AggregateBlock).join(Block).filter(Block.is_complete > == complete) > > print(len(select.all()) > > print(len(select.paginate(per_page=20).items()) > > > If I do a select.all(), I get the right number of rows which is 47. However, > if I try to paginate for a per_page size say 20, I lot a less rows like 11. > select.paginate(per_page=20). > The number could go up to 21 or so as I increase the page size. Why would > paginate decrease the number of returned records? > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sqlalchemy/e5a14305-4e63-4467-9610-1faf3f8c8412o%40googlegroups.com. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CAFHwexeg0K1URZZbi4oaQHtLFkCVbE3Fn2%2BY%3DDaaZymUp9S-qg%40mail.gmail.com.
[sqlalchemy] Flask SQlAlchemy BaseQuery Paginate not working correctly
Hello, I've been struggling with this issue for the past couple of days and would really, truly appreciate if someone could please give me pointers or direction as to what I might be missing. Here are my models; class Aggregate(db.Model): id = db.Column(UUID(as_uuid=True), primary_key=True, server_default=db.text('uuid_generate_v4()')) blocks = db.relationship('AggregateBlock', cascade='all, delete-orphan', passive_deletes=True, back_populates='aggregate').. class AggregateBlock(db.Model): id = db.Column(UUID(as_uuid=True), primary_key=True, server_default=db.text('uuid_generate_v4()')) block_id = db.Column(UUID(as_uuid=True), db.ForeignKey('blocks.id', ondelete='CASCADE'), nullable=False, index=True) aggregate_id = db.Column(UUID(as_uuid=True), db.ForeignKey('aggregates.id', ondelete= 'RESTRICT'), nullable=False) block = db.relationship('Block', back_populates='aggregates') aggregate = db.relationship('Aggregate', back_populates='blocks') class Block(db.Model): id = db.Column(UUID(as_uuid=True), primary_key=True, server_default=db.text('uuid_generate_v4()')) is_complete = db.Column(db.Boolean, default=False) aggregates = db.relationship('AggregateBlock', cascade='all, delete-orphan', passive_deletes=True, back_populates= 'block') from flask_sqlalchemy import SQLAlchemy db = SQLAlchemy() select = db.session.query(Aggregate).join(AggregateBlock).join(Block).filter(Block.is_complete == complete) print(len(select.all()) print(len(select.paginate(per_page=20).items()) If I do a select.all(), I get the right number of rows which is 47. However, if I try to paginate for a per_page size say 20, I lot a less rows like 11. select.paginate(per_page=20). The number could go up to 21 or so as I increase the page size. Why would paginate decrease the number of returned records? -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/e5a14305-4e63-4467-9610-1faf3f8c8412o%40googlegroups.com.